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Response required 


Blogs and online comments can provide valuable feedback on newly published research. Scientists 


need to adjust their mindsets to embrace and respond to these new forums for debate. 


ered a bacterium that can replace the phosphorus in its DNA 

with arsenic. You may have heard that this could help the hunt 
for aliens. You may even have heard that the ‘arsenic bacterium is itself 
an alien. What you will not have seen or heard is a detailed response 
from NASA and the scientists involved to online criticism of their 
work. In the face of worldwide attention on their paper (F. Wolfe- 
Simon et al. Science doi:10.1126/science.1197258; 2010), which NASA 
and the team deliberately courted, the researchers have stuck their 
heads in the digital sand. 

In response to the arsenic bacterium claims, bloggers and research- 
ers raised serious and thoughtful reservations about the paper’s meth- 
odology and findings. But the authors say that they will not engage 
with these critics, or with science journalists drawn to the controversy, 
because such discussion should be moderated in the peer-reviewed 
literature. Meanwhile, they are urging other scientists to work to rep- 
licate their results — a process that will take many months. “We are 
not going to engage in this sort of discussion,’ Felisa Wolfe-Simon, the 
paper's lead author and a NASA astrobiology research fellow at the US 
Geological Survey in Menlo Park, California, told one Nature reporter, 
“Any discourse will have to be peer-reviewed in the same manner as 
our paper was, and go through a vetting process so that all discussion 
is properly moderated” 

Purists who hold peer review as the casting vote in such debates 
will read her words with approval. But the problem is that Wolfe- 
Simons reticence is the polar opposite of the fanfare with which NASA 
trailed her discovery to the public. In an advance press advisory on 
29 November, NASA trumpeted an “astrobiology finding that will 
impact the search for evidence of extraterrestrial life”. At a press con- 
ference to coincide with the paper's publication, the authors reported 
a more down-to-Earth, but nonetheless radical, discovery, claiming 
that an arsenic-tolerant bacterium had rewritten the rules of life as 
we know them. 

Such claims were always likely to bring intensive scrutiny, especially 
as many scientists think that NASA has form for making extrava- 
gant claims in the field of astrobiology. Within two days of the paper 
appearing, Rosie Redfield, a microbial geneticist at the University of 
British Columbia in Vancouver, Canada, published a long and detailed 
critique of what she described as the paper’s methodological short- 
coming on her blog (go.nature.com/ddesjw). She was one of several 
researchers who used their blogs to question whether the paper's data 
supported its claims. It was at this point that the authors, previously 
happy to promote their findings, refused to answer further questions 
and retreated behind the walls of peer review. 

Formal peer review does give criticized authors time to think critically 
and carefully, and it is a good way to filter out rubbish. But in this case, 
much of the criticism was already coming from the researchers’ peers. 
And it should be remembered that peer review as conducted by journals 


: : ou may have seen claims that scientists at NASA have discov- 


is itself full of differing opinions, and is not the only way to crystallize 
truth from such disputes. In this instance, a prompt and explicitly pro- 
visional response from the authors would have been a better approach, 

particularly given the way they encouraged the original attention. 
Nature strongly encourages post-publication discussion on blogs 
and online commenting facilities as a complement to — but nota 
substitute for — conventional peer review. Yet it is true that so far 
online commenting and blogs have generally 


“Bloggers have contributed little. Of the thousands of papers 
an important published every year, only a few attract sub- 
part to playin stantive comments. And, regrettably, it seems 
the assessment that even those meagre comments rarely 
of research spark debate: a study of medical articles in 


the BMJ last August found that few authors 
bothered to respond to online criticisms of 
their papers (P. C. Getzsche et al. Br. Med. J. 341, c3926; 2010). 

Bloggers and online commentators have an important part to play 
in the assessment of research findings, and many researchers’ blogs, in 
particular, contain better analyses of the true significance of a scientific 
finding or debate than is seen in much of the mainstream media. Sci- 
ence journalists who repeated NASAs claims on the arsenic bacterium 
and did not tap into the widespread criticisms, did little to defend 
themselves from claims of reporting by press release. Blogging scien- 
tists, meanwhile, should remember that such informal forums do not 
excuse insults and casual discourtesy towards colleagues — especially 
those being urged to respond. 

In the end, the scientific truth will prevail, as it usually does. In the 
meantime, researchers must accept some harsh truths about the speed 
and spread of digital criticism. m 


findings.” 


Great expectations 


If Europe’s new states are to follow the research 
roadmap, capacity is as essential as funding. 


is, for academics, akin to being chosen to hold the Olympic 
Games. The warm glow of prestige is matched by the flow of 
hard cash to regenerate land and communities, while the rush of the 
best scientific minds to the new equipment can give a major boost to 
national research performance. 
So the Czech Republic, Hungary and Romania are rightly proud 
to have beaten France and the United Kingdom to jointly host the 
€800-million (US$1-billion) Extreme Light Infrastructure (ELI), a 


r | “lo win a national bid to host a new European research facility 
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consortium of three independent laser facilities to deliver images at 
the atomic level. 

The project is part of a roadmap for European research infrastruc- 
ture — a wish list of research facilities drawn up by the best scientific 
minds across the European Union (EU) — and the first to be built in 
newer, and often less-well-off, member states. 

The ELI is on track to begin construction early next year, but the 
real test starts now. To build it, the host countries will use EU struc- 
tural funds — a multi-billion-euro pot established to help narrow 
the economic and social disparities between member states. Earlier 
this year, Maire Geoghegan-Quinn, European Commissioner for 
Research, Innovation and Science, said she hoped to divert €86 bil- 
lion of EU structural funds to building Europe's “knowledge economy’, 
including research infrastructure. In the past, it has been difficult to 
track how countries have spent such structural funds, and this lack of 
transparency has led to a sense of mistrust. As a result, policy wonks 
in established member states are questioning the merits of using struc- 
tural funds to support research in Europe, such as on the ELI. 

Poland is a major beneficiary of structural funding for research 
infrastructure, and has been allocated €1 billion over the period 
2007-2013. Critics of the approach were handed ammuni- 
tion earlier this year, when Poland invited a panel of interna- 
tional scientists to assess the research infrastructure it wants to 
build in the future, partly using structural funds. The country 
should be applauded for its scientifically responsible approach. 
But some of the experts on the panel have some concerns about 
the scientific quality of the country’s proposals. 

Some projects look more like plans to create networks between 
national universities, they say, or attempts to build and strengthen 
national industries, rather than to develop cutting-edge research infra- 
structures. One project aims to build a knowledge alliance between 
several universities to help develop foundry and metallurgy industries, 
but contains no ideas for what research would be conducted in this 
area. Instead, it focuses on how the institutions can be linked up easily, 
sited as they are along major highways. 

Out of a total of 60 points that each proposal could be awarded, the 


highest mark was 45.3; the majority of projects came in at just over half 
marks. As one scientist on the assessment panel (from a research-inten- 
sive member state) commented, only projects awarded the equivalent 
of 54 points or more would be considered for funding in their 
home nation. There are also widespread concerns in Europe that the 
new member states lack the experience to manage large infrastructure 
projects, including handling budgets, procurement and legal aspects. 
Insiders at the ELI say that this lack of experience is beginning to show, 
in preparing accounts for example. 

The European Research Advisory Board, an independent advisory 
committee to the European Commission, echoes these fears in a report 
published in October. The board is concerned that the power given 
to member states, to decide which projects to fund with structural 
funds, directs investment towards building national capacity, rather 
than cutting-edge research. 

The board recommends that some of the 
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i stent lish structural funds be held back ina central pot, 

infrastructure to be allocated to projects judged to be of a 
. leak high standard by experts, and which would 

proj oriaiteace serve pan-European needs. Although this 

aie i approach may be better for research as a 


whole, it doesn’t address the difficulties faced 
in the new member states. 

These difficulties are not confined to the 
newer member states, as those countries involved in building ITER, 
the fusion test reactor struggling to life near Cadarache, France, have 
learnt the hard way. Legal and managerial expertise that is crucial 
to make such projects work must be actively sought and shared. 
For example, the European Investment Bank’s initiative to help new 
member states prepare financial proposals for major projects could 
be extended to see projects through to later stages. And a portion of 
structural funds earmarked for research infrastructure could be set 
aside to train scientists as managers. 

Structural funds for research infrastructure should continue to flow, 
but more international support is needed to ensure that the structures 
built around them are sound. = 


industries.” 


Asbestos scandal 


Irresponsible policies could cause an epidemic 
of malignant lung disease. 


thin glass straws, some no more than a fraction of a micrometre 

wide. If inhaled, they penetrate the soft alveoli of the lungs and 
the membranes that line the chest cavity. And there they stay. Over 
time, damaged cells can cause a malignant disease called mesothelioma, 
which often kills people, horribly, less than a year after diagnosis. 

Before the widespread industrial use of asbestos began in the late 
nineteenth century, malignant mesothelioma was unheard of, yet it 
is now responsible for tens of thousands of deaths around the world 
every year. After the link between asbestos exposure and the disease 
was convincingly made in 1960, responsible nations eventually took 
strong measures to remove the mineral from commercial products 
and to halt mining and export. Less responsible nations did not; this 
is a scandal that deserves wider attention. 

The United States has still not banned asbestos, despite the millions 
of dollars spent to clear it from homes and from communities near 
mines. And Canada has been criticized for plans to expand asbestos 
mining operations, which export the material to India, Indonesia and 
the Philippines. Although Canada enforces strict guidelines on asbes- 
tos use at home to protect its own people, those in countries to which it 


V ire through an electron microscope, asbestos fibres look like 
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sends the mineral have little or no protection. Asbestos exported from 
Canada and other countries including Russia, Brazil and Kazakhstan 
is routinely mixed into building materials and consumer products, 
prized for the same durability that makes it troublesome for living tis- 
sue. Owing to the long time between exposure and the onset of disease, 
30 years or more, the asbestos trade in North America and elsewhere is 
creating an epidemic that may take decades to peak and subside. 

The minerals industry has long tried to convince regulators that 
white asbestos — or chrysotile — is safe when handled properly. It 
argues that only the already controlled forms — blue and brown asbes- 
tos, known collectively as amphibole — are of concern. 

To support this, industry advocates point to scientific data and studies. 
Yet although the relevant literature is a mire of conflicting results, this 
should not be seen as an endorsement of their position. Rather, it reflects 
a string of industry-sponsored studies designed only to cast doubt on 
the clear links between chrysotile and lung disease. These are familiar 
tactics and several countries, including Britain, have seen through them 
and made the correct decision to ban all forms of asbestos, all of which 
have been proven to be carcinogenic in humans. 

Meanwhile, researchers are finding new causes for concern with other 
natural fibrous minerals such as erionite (see page 884). Complacency is 
the problem. Much of the developed world has seen asbestos removed 
from public spaces, leaving in many mindsa false sense of security. The 
public should once again be made aware of the 
risks associated with exposure to mineral fibres, 
as well as some man-made fibres. And govern- 
ments must ban the extraction, processing and 
use of materials that can cause serious disease. m 
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launch a National Center for Advancing Translational Sciences, 

focusing on translational medicine and therapeutics (TMAT), the 
growing field that aims to speed therapies from the laboratory to the 
clinic. NIH director Francis Collins called the decision “momentous” 
—a “disruptive innovation on an institutional scale” — and I think he is 
right. Only a translational approach can address the fact that the current 
model of drug discovery and development is unsustainable. Paradoxi- 
cally, as we have witnessed a successful revolution in drug discovery, 
acrisis has emerged in drug development. Targets, and the chemistry 
needed to probe them, can be selected more rationally than ever — yet 
more and more candidate drugs are proving expensive failures. 

One reason is that too many steps are pursued in specialist isolation, 
in both academia and industry. Too few people 
can bridge the translational and interdisciplinary 
divides. This has led to crucial and expensive 
mistakes in phase II of drug development — 
when there is often a failure to see an impact on 
efficacy, a propensity to ignore risks, or a danger 
of making errors in dose selection for phase III. 

The new NIH centre promises to catalyse a 
much-needed restructuring of the drug-devel- 
opment process. The centre can foster train- 
ing by absorbing the Clinical and Translational 
Science Awards (CTSAs) and their educational 
infrastructure. This will allow scientists to part- 
ner in a modular approach to drug development, 
in which expertise is drawn from distinct sectors 
and regions as needed to address particular thera- 
peutic challenges. Furthermore, the broad CTSA- 
supported programmes and infrastructure — from preclinical science 
to community outreach — could be harvested to support a more effi- 
cient approach to drug development, approval and dissemination. 

Why has the need for such a radical change emerged? Thirty years 
ago, the best clinical pharmacology units housed experts from a range of 
disciplines. Cell biologists worked side by side with colleagues studying 
model systems and those involved in mechanistic studies of physiology, 
disease and drug action in humans and pharmacokinetics. Others were 
trained in chemistry, statistics and toxicology. Blending these heteroge- 
neous talents fostered what we would now call interdisciplinary science, 
and, in the context of drug development, T1 translational research. 

However, as the economics of academic departments shifted, clinical 
pharmacology fell from favour. Even the term clinical pharmacology has 
lost its lustre, and now covers only some of what we need. To attract the 
best and brightest, we need a new brand, backed 


L= week, the US National Institutes of Health (NIH) voted to 


by funders, academics and industry. Potential NATURE.COM 

students must perceive the field to be hot. Discuss this article 
So what shall we call this interdisciplinary, _ online at: 

translational endeavour? It is difficulttoimagine —_go.nature.com/u9jfqy 


WE MUST 
REVISE HOW WE 


REWARD IDEAS 


AND WILL NEED 
COMMON 
STANDARDS 


OF DATA PROTECTION. 


Drug development needs 
a new brand of science 


We need to break with the past to develop new medicines, says 
. Garret FitzGerald. An interdisciplinary NIH centre points the way. 


anyone rushing to join something called “T1 translational research. 
“TMAT; on the other hand, captures the fashion for translation, places 
the discipline in the heart of medicine and indicates the focus on devel- 
oping novel therapeutics. Adoption of this term by the NIH follows 
a training programme in TMAT funded by the UK Wellcome Trust. 
Now we need to realize the potential of this brand and push the idea 
more widely. 

The NIH centre will signal, both to Congress and the biomedical 
research community, the intimate connection between fundamental 
science and the accelerated delivery of cures to the general public. This 
is not a zero-sum game: success of translation requires investment in 
basic science. By developing sustainable career structures in TMAT, the 
centre can reverse the flow of bright young scientists into specialist silos. 
Joint investments in training, infrastructure and 
programmes would ensure that the efforts of the 
new centre would improve, not compete with, the 
translational efforts of disease-focused institutes 
and centres within the NIH. 

Thenew TMAT centre could also act as a visible 
point of contact for extramural partners, includ- 
ing industry, charitable foundations and the US 
Food and Drug Administration, to buy into the 
restructuring required to move to a more modular 
approach to drug discovery and development. A 
looser, more distributed model spanning pharma, 
biotech and academia could then draw on knowl- 
edge more easily, and apply it more efficiently. 

It is a big challenge, and two particular obsta- 
cles come to mind. First, we must revise how we 
reward ideas. At present, defence of intellectual 
property relies on patents on the composition of matter, usually mol- 
ecules, most of which never become approved drugs. To make sure that 
they do, many people with diverse skill sets have to work effectively 
together. Inside a company, it is easy to reward everybody involved. As 
companies fragment, we should consider new models of intellectual 
property. Perhaps the financial rewards ofa patent should be postponed 
until a drug is a profitable success — and a formal mechanism found to 
distribute rewards among all those who helped to make it happen. 

Second, we will need common standards of data protection and pri- 
vacy, and shared infrastructure that allows secure and compliant sharing 
of diverse types of information, including clinical data, across countries 
and sectors. This is the foundation upon which a global TMAT enter- 
prise can be established. In some ways, this is the greatest challenge of 
all, but it can be done. As T: S. Eliot said: “Only those who will risk going 
too far can possibly find out how far one can go.” m SEENEWS P.877 


Garret FitzGerald is director of the Institute of Translational Medicine 
and Therapeutics at the University of Pennsylvania in Philadelphia. 
e-mail: garret@exchange.upenn.edu 
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Final frontier of 
flowering plants 


Half of the world’s yet-to-be- 
discovered flowering plant 
species may already have been 
collected, and now languish in 
herbarium cabinets. 

While reclassifying varieties 
of Strobilanthes, a genus of 
purple-flowered plants from 
Asia, Robert Scotland of the 
University of Oxford, UK, and 
his colleagues noticed that 
many of the 60 species they 
described had been collected 
many years before. This lag 
ranged from 1 to 210 years and 
averaged more than 30 years 
for more than 3,000 species 
in 6 plant genera, including 
Strobilanthes. Just 16% of these 
plants were classified within 
5 years of discovery. 

If this trend holds for other 
flowering plants, 47% to 
66% of the planet's estimated 
70,000 undiscovered species 
are waiting to be unveiled in 
herbaria. 

Proc. Natl. Acad. Sci. USA 
doi:10.1073/pnas.1011841108 
(2010) 

For a blog entry on this 
research, see go.nature.com/ 


veqaig. 


Caterpillars 
whistle for safety 


When under attack, walnut 
sphinx caterpillars (Amorpha 
juglandis; pictured), 


Twisted tale of snail evolution 


Dextral snail shells coil rightwards, and sinistral 
shells coil leftwards. Sinistral Satsuma snails 
cannot mate with right-coiling Satsuma species, 
leading scientists to wonder how sinistrality 
could have spread through dextral populations. 
Masaki Hoso of Tohoku University in Sendai, 
Japan, and his colleagues show that sinistrality 
has arisen independently multiple times in 
Satsuma, and more often where snakes in the 


Pareatidae family occur. 


whistle. An 1868 Canadian 
Entomologist paper, “Musical 
larvae,” first reported these 
shrieks, but their purpose 
wasn't clear. 

Jayne Yack at Carleton 
University in Ottawa, Canada, 
and her team now show that 


The team found that the Pareas iwasakii 
snake, which preys on the molluscs, must stick 
to right-coiling species as its jaws are specialized 
for grasping them. (Snake jaw, with extra teeth 
on the lower mandible, pictured.) That gave 
sinistral individuals an adaptive advantage, 
allowing left-coiling species to emerge. 

Nature Commun. doi:10.1038/ncomms1133 (2010) 
For alonger story on this research, see 


go.nature.com/9fetev. 


their real predators, yellow 
warblers, the caterpillars 
whistled each time the 

birds swooped in for attack, 
repelling multiple assaults 
until the warblers gave up. 

J. Exp. Biol, 214, 30-37 (2011) 
For videos, see go.nature.com/ 


waves — ultrasonication — 
is used to break up carbon 
nanotubes. But no one really 
understands the underlying 
mechanism. Kyung-Suk 
Kim at Brown University in 
Providence, Rhode Island, 
and his collaborators have 


the whistle, produced through zgeqyc. shed some light on the 
openings along the body called interplay between nanotubes 
spiracles, is a defence against and the minute bubbles 


predators. Simulated attacks 
with blunt tweezers caused the 
caterpillars to pull their heads 
back, forcing air through two 
of the spiracles in a succession 
of squeaks. 

When confronted by 
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Pressed to 
breaking point 


Every day in labs around 
the world, a technique using 
high-frequency sound 


created by the sound waves 
under water. 

When the bubbles implode, 
tubes in the water near them 
are suddenly compressed 
along their lengths. The tubes 
buckle, and some atoms 


H. KLAUK (MAX PLANCK INST. SOLID STATE RESEARCH) 


are knocked off, weakening 
the tube until it ultimately 
breaks. 

Proc. R. Soc. A doi:10.1098/ 
rspa.2010.0495 (2010) 


GENETICS 


Sex and the social 
slime mold 


A single gene is sometimes 
all it takes to change a slime 
mold’s sexual identity. 

The social amoeba 
Dictyostelium discoideum 
has three different sexes 
— members of one sex, or 
‘mating type; can fuse with 
either of the other two to form 
giant, dormant cysts. But little 
is known about what genes 
determine the sexual identity 
ofa slime mold. 

Gareth Bloomfield of the 
Medical Research Council 
molecular biology lab in 
Cambridge, UK, and his 
colleagues found a region of 
the D. discoideum genome 
that differed among sexes. 
Deleting a gene from this 
region prevented mating- 
type I from coupling with 
mating-type II; reintroducing 
the gene restored normal 
sexual orientation. Meanwhile, 
swapping sex genes from one 
mating type to another caused 
the amoebae to switch sexual 
partners. 

Science 330, 1533-1536 (2010) 


Mother’s dinner, 
daughter’s nose 


The smell of mouse mothers’ 
food influences the olfactory 
anatomy of their pups, 
and primes them to prefer 
the same flavours as their 
mothers. 

Josephine Todrank at 
the University of Colorado, 
Denver, and her colleagues 
studied lines of mice in which 
select olfactory sensory 
neurons that responded to 
smells such as cherry or mint 
were tagged with the gene for 
green fluorescent protein. 
The mothers were given 
scented food while either 
gestating or nursing their 


litters, or during both phases. 
When their pups were tested 
at 20 days old, fluorescence 
revealed larger glomeruli 

— bundles of synapses — 
formed by neurons specific 
to the smells added to their 
mother’s food. Pups also 
preferred the smells of the 
food their mothers ate. 

Such preferences could 
predispose animals to choose 
familiar and safe foods, 
although in humans they 
could backfire to plant the 
seed of preference for alcohol 
or unhealthy foods, the 
authors say. 

Proc. R. Soc. B doi:10.1098/ 
rspb.2010.2314 (2010) 


Currency circuitry 


Modern anti-counterfeiting 
features on banknotes are 
getting more sophisticated, 
ranging from complex and 
colourful watermarks to 
holograms and foil strips. 
Now Ute Zschieschang of 
the Max Planck Institute 
for Solid State Research in 
Stuttgart, Germany, and 
her colleagues have added 
yet another weapon to the 


| Se 


arsenal: trackable digital 
circuits. 

The researchers fabricated 
low-voltage organic 
transistors on the surface 
ofa €5 note (pictured), 
using a 3-nanometre-thick 
insulating layer made of 
aluminium oxide and 
octadecylphosphonic acid 
that could be deposited 
without damaging the 
surface of the banknote. A 
total of 92% of the deposited 
transistors were functional — 
a high enough proportion for 


RESEARCH HIGHLIGHTS Mii Saiaae 


COMMUNITY 
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The most viewed 


papers in science 


Making maoecrystal V 


€S HIGHLY READ 
in November on 
pubs.acs.org 


The complex molecule maoecrystal V 
has been synthesized in the laboratory, 
after six years of intense effort by high- 


profile chemists. Zhen Yang and his 
colleagues at Peking University in Beijing created the 
sought-after compound — which shows potent activity 
against cancer cells — in a concise 16-step synthesis. It was 
originally extracted from a Chinese herb (Isodon eriocalyx) 
that has long been used as a folk medicine to treat flu 
and inflammation, and has already produced a number 
of potential anticancer agents. By varying the laboratory 
synthesis, chemists will be able to make and test closely 
related structures that may prove better medicines than 


maoecrystal V itself. 


J. Am. Chem. Soc. 132, 16745-16746 (2010) 


the circuits to work reliably. 
Adv. Mat. doi:10.1002/ 
adma.201003374 (2010) 


It’s never too early 
to get sequenced 


A developing baby’s entire 
genome is hidden in its 
mother’s blood, potentially 
offering a non-invasive test for 
congenital diseases. Dennis Lo 
of the Chinese University of 
Hong Kong and his colleagues 
sequenced billions of DNA 
base pairs from the plasma of 
a pregnant woman and then 
developed a way to distinguish 
her DNA sequences from the 
fetus’s. 

Both parents carried 
asingle mutation for 
f-thalassaemia, a rare blood 
disorder caused by two faulty 
copies of the gene HBB. Los 
analysis demonstrated that 
the father had passed on his 
mutation, but the mother 
had given the fetus a healthy 
copy of HBB, sparing it from 
6-thalassaemia. Such genetic 
screening could replace 
invasive prenatal diagnostic 
tests such as amniocentesis. 
Sci. Transl. Med. 2,61ra91 (2010) 
For a longer story on this 
research, see go.nature.com/ 
djxvga. 


PLANETARY SCIENCE 


Impacts sent bling 
to early Earth 


Call it a gift. Late in the Solar 
System’s formation, a shower 
of objects up to the size of 
Pluto delivered to Earth a large 
quantity of rock containing 
gold, platinum and other 
elements that bind readily 
with iron. Researchers believe 
these elements were added. 
to the mantle late in Earth’s 
development, because if they 
had been present when the 
planet was molten, they would 
have sunk to its core, with iron. 
William Bottke of the 
Southwest Research Institute 
in Boulder, Colorado and his 
colleagues used abundances of 
iron-loving elements on Earth, 
the Moon and Mars to model 
how later impacts from large 
objects could have replenished 
reserves in the planets’ mantles. 
The findings may also explain 
the sizes of the oldest craters on 
the Moon and Mars. 
Science 330, 1527-1530 (2010) 
For a longer story on this 
research, see go.nature.com/ 
bbdewm. 


> NATURE.COM 

For the latest research published by 
Nature visit: 
www.nature,com/latestresearch 


16 DECEMBER 2010 | VOL 468 | NATURE | 871 
© 2010 Macmillan Publishers Limited. All rights reserved 


SEVEN DAYS nesenni 


Crop catalogue 

A global search to gather the 
wild relatives of essential food 
crops such as wheat, barley 
and rice has been launched 

by the Global Crop Diversity 
Trust, based in Rome. The 
ten-year initiative, announced 
on 10 December, aims to 
increase food security by 
finding genetic traits that 
might be suited to future 
climates. Samples of wild 
plants will now be conserved 
alongside existing stores of 
domesticated seeds (such as 
the Svalbard Global Seed Vault 
on the Norwegian island of 
Spitsbergen). See go.nature. 
com/I8mgn2 for more. 


Higgs hunt extended 
A 15-month shutdown to 
upgrade the Large Hadron 
Collider is set to be delayed 

by ayear to the end of 2012. 
The extended run will be used 
by scientists at the particle- 
physics laboratory CERN 

near Geneva, Switzerland, 

to hunt for the elusive Higgs 
particle at the collider’s current 
collision energies. The plan is 
likely to be agreed by CERN’s 
management and council in 
January. See page 876 for more. 


African innovation 


Africa is struggling to turn 
local discoveries into drugs and 
other health-care inventions, 
according to papers produced 
by the McLaughlin-Rotman 
Center for Global Health in 
Toronto, Canada. The reports, 
published by BioMed Central 
on 13 December, identify 

25 ‘stagnant technologies’ 
languishing in African 
health-care institutions, 
including several drug 
candidates and a dipstick 

test for schistosomiasis. 
Scientists have no incentive to 
commercialize results, there 

is scant institutional support 
for knowledge transfer, and 


Private spaceflight success 


roughly 800 kilometres west of Mexico. NASA 
expects the craft to ferry astronauts, supplies and 
research materials to the International Space 
Station when its shuttle fleet retires next year. 
SpaceX, based in Hawthorne, California, hopes 
to dock Dragon with the station during its next 
demonstration launch, scheduled for 2011. 


SpaceX (Space Exploration Technologies 
Corporation) has become the first private firm 
to launch a spacecraft into orbit and return it 

to Earth. On 8 December, its reusable ‘Drago’ 
capsule was launched on a Falcon 9 rocket 
from Cape Canaveral, Florida. Completing two 
orbits, it splashed down in the Pacific Ocean 


existing regulatory frameworks 
inhibit innovation, the papers 
say. See go.nature.com/py46rh 
for more. 


Venus probe flop 

In a bitter disappointment 

for Japan’s space agency, its 
Akatsuki spacecraft failed to 
enter orbit around Venus on 
6 December. The probe was 
intended to monitor the hot 
planet’s atmosphere, but must 
now wait six years for another 
chance to reach orbit. See page 
882 for more. 


POLICY 


NIH access 

A key panel of advisers to 

the US National Institutes of 
Health (NIH) voted last week 
to open the Clinical Center 

— the agency’s huge research 
hospital in Bethesda, Maryland 
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— to outside investigators. 
The Scientific Management 
Review Board on 7 December 
recommended extramural 
scientists be given access to 
the facility, where roughly 
1,500 patient studies are in 
progress at any given time (see 
Nature 466, 172; 2010). The 
same board voted to establish a 
translational-medicine centre 
at the NIH (see page 877 for 
more). 


European patent 
Countries in the European 
Union (EU) have broken 
through a decade-long impasse 
over establishing a low-cost 
single European patent 
system. At a meeting of the 
EU competitiveness council 
on 13 December, 11 countries 
agreed on a plan to translate 
EU patents into English and 
one of French or German; 


12 others suggested that they 
would join the proposal. Italy 
and Spain voted against the 
scheme, but countries invoked 
an ‘enhanced cooperatio 
provision, which allows them 
to progress without attaining 
unanimous agreement. A 
common European patent 
could be in place by the end of 
next year; a formal decision is 
expected in March. 


Anthrax report 

The US National Academy of 
Sciences has delayed releasing 
a long-awaited report on 

the investigation into the 
2001 anthrax attacks, after a 
request by the Federal Bureau 
of Investigation (FBI). The 
report examines the scientific 
evidence used by the FBI 

to accuse microbiologist 
Bruce Ivins of the attacks, 
which killed five people. 
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Ivins committed suicide in 
2008. After seeing a draft 
copy, the FBI said hundreds 
more pages of previously 
undisclosed documents 
should be considered by the 
investigation, which will now 
continue until February 2011. 


Cancun climate deal 


United Nations climate talks in 
Canctin, Mexico, ended with 
an agreement by developed 
and developing countries 

to reduce greenhouse-gas 
emissions — largely approving 
commitments made in last 
year’s Copenhagen Accord. 
See page 875 for more. 


Committee chairs 
Ralph Hall (Republican, 
Texas) was on 8 December 
confirmed as the new 
chairman of the US House 
Committee on Science 

and Technology. Hall 
(pictured) has made it clear 
that he will take a hard line 
against attempts to regulate 
greenhouse gases. Fred Upton 
(Republican, Michigan) 


TREND WATCH | 


After last year’s drop in 
research-grant funding, the Irish 
government kept a promise to 
spare researchers further pain in 
its austerity budget for 2011-14, 
announced on 7 December. Total 
funding for basic science has 
flatlined, not including inflation, 
since 2008. But the Department of 
Enterprise, Trade and Innovation 
announced a 12.5% increase in 

its science and technology budget 
compared with 2010. The basic- 
science funding body Science 
Foundation Ireland saw a 7% 
increase in its share. 


— who has frequently 
supported environmental 
legislation — will chair the 
House Committee on Energy 
and Commerce. 


NASA chief scientist 


Waleed Abdalati will be 
NASAs chief scientist from 

3 January, the agency’s 
administrator Charles Bolden 
announced on 13 December. 
A researcher on polar ice 

who worked at NASA fora 
decade until 2008, Abdalati is 
currently director of the Earth 
Science and Observation 
Center at the University of 
Colorado at Boulder. He is 
NASASs first chief scientist 
since James Garvin, who 
served in the post during 
2004-05. 


Nobel chemist dies 
John Fenn (pictured), who 
shared the 2002 Nobel 

Prize in Chemistry, died on 

10 December aged 93. In 

the late 1980s, he developed 
electrospray ionization, a way 
to gently separate clumped 
proteins into a fine spray of 
individual molecules. This 
method, when combined 

with mass spectrometry, gave 
scientists a tool to quickly 
identify proteins via their mass 
and helped to launch the field 
of proteomics. In 2005, Fenn 
lost a legal battle over the patent 
rights to Yale University in New 
Haven, Connecticut, where 

he developed the technique. 


IRELAND’S SCIENCE BUDGET 
Despite cutting €6 billion (US$8.1 billion) from its budget, 
Ireland maintained its science funding. 


450 


Budget (€ million) 


2008 


He had moved to Virginia 
Commonwealth University in 
Richmond in 1994. 


Biotech bid 

On 8 December, US 
pharmaceutical giant Johnson 
& Johnson issued a long- 
awaited public offer to buy 
Crucell, a biotechnology firm 
headquartered in Leiden, 

the Netherlands. Johnson & 
Johnson, in New Brunswick, 
New Jersey, offered to pay 
€1.75 billion (US$2.3 billion) 
for Crucell, which specializes 
in vaccines and antibody 
therapies. Crucell’s board 

of directors unanimously 
supports the deal, and 
shareholders will vote on the 
matter on 8 February. 


Cheap sequencing 


Research-services giant Life 
Technologies of Carlsbad, 
California, announced 

on 14 December that it 

is now selling benchtop 
DNA sequencers to labs for 
less than US$50,000. The 


Research infrastructure 
funds (PRTLI) 


PRTLI transferred to the 
Department of Enterprise, 
Trade and Innovation 
(DETI) through 2010 

and 2011 


§§) DETI’s science budget 


IB Science Foundation 
reland (a basic 
science funding 
agency, part of DETI) 


2009 2010 2011 


SEVEN DAYS | THIS WEEK | 


sequencers, the first of anew 
wave of affordable machines 

to reach the market, were 
developed by start-up firm 

Jon Torrent in Guilford, 
Connecticut, which was 
bought by Life Technologies in 
August. Current sequencing 
technologies label nucleotides 
with dyes, but the new machine 
uses semiconducting chips to 
detect hydrogen ions released 
as nucleotides are added to a 
DNA strand. Life Technologies 
also announced three out of 
seven $1-million prizes that 

it will award for solving key 
challenges in low-cost DNA 
sequencing. See go.nature.com/ 
mbhs6a for more. 


TB diagnosis 

The World Health 
Organization (WHO) said on 
8 December that a test that can 
rapidly diagnose tuberculosis 
(TB) was a ‘major milestone’ 
for disease control. The DNA- 
based ‘Xpert MTB/RIF test, 
developed by the non-profit 
Foundation for Innovative 
New Diagnostics in Geneva, 
Switzerland, and the company 
Cepheid, based in Sunnyvale, 
California, can detect TB 

in around 100 minutes. 
Traditional tests, based on 
sputum-smear microscopy, 
can take up to three months to 
yield results, the WHO said. 
The new test is costlier, so will 
need donor funding, although 
Cepheid will cut prices by 75% 
for poorer nations. 


CORRECTION 

The story ‘Synchrotron cuts’ 
(Nature 468, 736; 2010) 
incorrectly gave the three- 
year budget of the European 
Synchrotron Radiation 
Facility as €86.8 million. 
That number is the facility’s 
annual budget. The brief also 
said that cuts to “operating 
time” would be made. To 
clarify, two existing beam 
lines will be closed, but the 
accelerator will continue to 
run on its normal schedule. 


NATURE.COM 
For daily news updates see: 
Www.nature,com/news 


16 DECEMBER 2010 | VOL 468 | NATURE | 873 
© 2010 Macmillan Publishers Limited. All rights reserved 


J. BARRETO/AFP/GETTY 


NEWS WW FOCUS 


o> 
Doomwatch denied Recreational Japanese probe Hazardous dust raises 
as LHC fails to generate mini ancestry analysis takes misses its date with concerns in Turkey and 
black holes p.876 to the web p.880 Venus p.881 North Dakota p.884 


ENVIRONMENTAL SCIENCE 


i 
Christiana Figueres (left), executive secretary of the UN climate framework, speaks with Mexico’s foreign 
minister Patricia Espinosa during the recent climate-change conference in Canctn. 


Last-minute deal 
saves climate talks 


Modest agreement brings developing world into the field. 


BY JEFF TOLLEFSON 


needed to do in Canctin, Mexico, to keep 

the United Nations climate talks from 
collapsing into the failure that many had feared. 
The true extent of their success, however, will 
depend on what comes next. 

Working into the small hours of 11 Decem- 
ber, negotiators agreed that both developed and 
developing countries will act to reduce green- 
house-gas emissions — and that those actions 


[ossetteas negotiators did what they 


will be registered and subjected to some form of 
international verification. The accord represents 
a major shift for developing countries, which 
faced no such commitments under the exist- 
ing Kyoto Protocol, due to expire in 2012. The 
conference also reached a historic agreement on 
forest protection, and advanced programmes to 
help the developing world adopt clean energy 
and adapt to climate change. 

“It's not just about process, it’s about sub- 
stance,’ said Connie Hedegaard, the European 
Union's top climate official, in a news conference 


as the all-night talks wrapped up. “We have 
proven that multilateralism can create results” 

The negotiators built on the broader frame- 
work of last year’s Copenhagen Accord. “Ideas 
that were just skeletal last year are now approved 
and elaborated,” says Todd Stern, the chief US 
climate negotiator. But unlike the Copenha- 
gen document, which was blocked by a few 
countries in a rowdy final session, the Cancun 
agreement was adopted unanimously. “It’s a 
significant step forward,’ says Stern. 

The agreement sets a goal of limiting aver- 
age warming to 2°C above preindustrial levels, 
while acknowledging that current commitments 
registered under the UN climate framework do 
not add up to meeting that goal. It also pledges 
to periodically review the goal on the basis of 
“the best available scientific knowledge”. Parties 
agreed to help mitigate emissions according to 
their own capabilities while a new international 
reporting system tracks their progress; the 
burden on developing countries would mainly 
fall on rapidly emerging economies. 

David Victor, director of the Laboratory on 
International Law and Regulation at the Uni- 
versity of California, San Diego, calls the global 
nature of the agreement “tremendously impor- 
tant’, although it mostly amounts to register- 
ing previous progress. “They have managed 
to reach agreement by moving the goalposts 
closer to the ball,” he says. 

The talks nearly faltered over the future of 
the Kyoto Protocol, which requires the world’s 
wealthiest nations — apart from the United 
States, which has never signed on — to meet 
specific emissions targets. Japan made waves 
at the meeting's outset by announcing it would 
not support an extension of the treaty after it 
expires. The final text of the Canctin agreement 
defers the Kyoto question for another year. 

The other major dispute in Cancun was 
between the world’s two largest greenhouse- 
gas emitters — the United States and China. 
With a little help from India, which staked out 
a position in the middle and softened its rheto- 
ric against binding commitments, both parties 
were able to agree on some basic requirements 
for reporting and verifying climate pledges. 

Progress on these issues was crucial to keep 
the talks alive and to 
open the way to more 
focused decisions in 
other areas. First among 
them was the deforesta- 
tion agreement, which 


For our full coverage 
of the Cancun 
meeting, visit: 
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> advocates hailed as a major accomplish- 
ment that will bolster bilateral and multi- 
lateral efforts already under way. 
The Cancun agreement establishes 
a framework that would allow wealthy 
nations to pay others for “reducing emis- 
sions from deforestation and forest 
degradation” (known as REDD) and aug- 
menting the carbon stocks locked up in forests. 
Collectively, the programme is called REDD- 
plus. The agreement requires developing 
countries to craft a national plan, estab- 
lish a baseline for historic emissions from 
forest loss and create a system for monitor- 
ing their forests. Just as importantly, says 
John Niles, director of the Tropical Forest 
Group in San Diego, California, the agree- 
ment calls on an existing technical body to 
lookinto the programme rules and require- 
ments and then report back within a year. 
“Once we have those requirements, then 
everybody knows 


“They have what we have to 
managed to get to before any 
reachagreement money changes 
by moving the hands,’ says Niles. 
goalposts closer “This is the biggest 
to the ball.” decision we could 
have asked for” 


Delegates also agreed to establish a Green 
Climate Fund to be managed by representa- 
tives of the developed and developing world 
to help channel aid; a Canciin Adaptation 
Framework to help to guide decisions 
on funding for adaptation measures in 
the developing world; and a technology- 
transfer mechanism to supply developing 
nations with technology for clean energy 
and adaptation. As promised in Copen- 
hagen, industrialized nations will provide 
some US$30 billion for these programmes 
by 2012, and up to $100 billion annually by 
the end of the decade, although where the 
money would come from remains unclear. 

Tim Gore, climate-change adviser for 
Oxfam International, based in Oxford, UK, 
lauds the Green Climate Fund but says that 
countries missed an opportunity to spell 
out long-term climate funding, perhaps 
through a levy on international aviation 
and shipping. Nonetheless, the agreement 
represents “a solid step’, says Gore. “They 
are now walking in the right direction, but 
they need to start running.” m 
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PARTICLE PHYSICS 


No black holes, but 
extra time at LHC 


Upgrade likely to be delayed in bid to capture Higgs particle. 


BY GEOFF BRUMFIEL 


r | The end of the world is not nigh after all. 
Flouting predictions from some theo- 
rists, microscopic black holes have so far 

failed to appear inside the Large Hadron Col- 

lider (LHC), scientists there have revealed. 

The result, which will be posted this week on 
arXiv.org, comes as researchers make plans to 
keep the LHC running until the end of 2012, 
rather than 2011 as previously scheduled. The 
27-kilometre collider at the particle-physics 
laboratory CERN near Geneva, Switzerland, 
had endured delays and a crippling breakdown 
before finally surging 
to life late in 2009, and 
physicists say itis now 
performing above 
expectations. 

Predictions of mini 
black holes forming at 
collision energies ofa 
few teraelectronvolts 
(TeV) were based on 
theories that consider 
the gravitational effects 
of extra dimensions of 
space. Although the 
holes were expected to evaporate quickly, some 
suggested that they might linger long enough to 
consume the planet. But scientists at the Com- 
pact Muon Solenoid (CMS) detector now say 
they found no signs of mini black holes at ener- 
gies of 3.5-4.5 TeV. Physicist Guido Tonelli, the 
detector’s spokesperson, says that by the end of 
the next run, the LHC should be able to exclude 
the creation of black holes almost entirely. 

The find is one of a stream of recent papers 
from the LHC, made possible by the machine's 
unexpectedly high performance. “We were 
very surprised by how well behaved the 


No black holes here: the Compact Muon Solenoid. 


machine was when we started really pushing it 
to its limit,” says Steve Myers, the CERN physi- 
cist who oversaw this year’s LHC operations. 
As aconsequence, physicists are increasingly 
optimistic that they may be able to detect the 
elusive Higgs boson earlier than expected. The 
particle, the LHC’s best-known quarry, and its 
associated field are thought to endow other 
particles with mass. 

Initially, physicists were not sure that the 
LHC could create and detect the Higgs at the 
machine’s current energies, and CERN man- 
agers had planned a 15-month hiatus from the 
start of 2012 for an upgrade that would allow 
it to run at higher 
energies. But a grow- 
ing consensus holds 
that even without the 
upgrade, the LHC will 
be able to explore most 
of the energy range 
in which a standard 
Higgs particle might 
be found. Sergio Ber- 
tolucci, CERN’s direc- 
tor for research and 
computing, adds that 
there are political rea- 
sons to extend the run. The world’s second most 
powerful accelerator, the Tevatron at Fermilab 
in Batavia, Illinois, is nipping at the LHC’s heels 
as it gathers a growing body of data in its own 
Higgs hunt. Moreover, the potential success of 
the LHC is likely to influence European plans 
for high-energy physics, as well as a global plan 
for a next-generation linear collider. Both face 
big budget decisions in the next few years. 

The plan to extend the LHC’s run will be 
discussed at a meeting of LHC managers in 
Chamonix, France, in late January, with a final 
decision expected shortly after. = 
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Francis Collins 
The bridge between 
lab and clinic 


Francis Collins, director of the US National Institutes of Health (NIH), has made the translation 
of basic research to the clinic a top priority. On 7 December, that goal moved a step closer to 
realization when the Scientific Management Review Board (SMRB) — an NIH advisory body — 
voted nearly unanimously to recommend establishing a centre for translational medicine. Collins 
had told the review board that despite a “dizzying rate” of basic science discoveries, “far too often 
promising diagnostic devices and treatments are not making it to market”. He urged that the NIH 
step into the breach. The proposed National Center for Advancing Translational Sciences would 
combine several existing programmes, have a budget of at least US$650 million and could be up 


and running by October 2011. 


What is the significance of this vote? 

This is a momentous occasion. The SMRB has 
recommended the formation of a new centre 
based on scientific arguments about oppor- 
tunities in translation. I think this is a signal 
moment, placing the NIH ina new position 
to play a more muscular part in therapeutic 
development. That being said, it in no way 
discounts all of the other things that we need 
to be doing in basic 
science, nor does it say 
that all of [NIH-funded] 
clinical research is going 
to be folded into this 
enterprise. 


Special collection 
on translational 
research: 


What do you say to concerns that the new 
centre will draw resources away from basic 
research? 

I want to be very reassuring: although this 
centre is a new structure, it will not have much 
of an impact on the overall distribution of 
funds between basic and clinical research. 


How should basic researchers regard 

the centre? 

I think some basic scientists will be quite 
excited about the opportunity to be more con- 
nected with the clinical benefits of their own 
discoveries. Not that that’s at all necessary or 
required, or that basic scientists who don't feel 


IN FOCUS 


that inclination should be considered somehow 
unmotivated. Science for science’ sake is also 
a wonderful way to learn about life. But I do 
think there is more here that is positive than 
is negative for a basic scientist, if people will 
step back from their anxieties about budgetary 
considerations. That being said, we should all 
be anxious about the overall budget right now 
with the expectation that dollars for biomedical 
research are going to be very hard to come by in 
the next year or two. But that can’t be a reason 
to stop promoting innovation. 


The Clinical and Translational Science Awards, 
which account for nearly 40% of the budget of 
the National Center for Research Resources 
(NCRR), are being moved to the new centre. 
What happens to the rest of the NCRR’s 
programmes? 

Again, the strong assurance is that these 
programmes are valued, that they will be sup- 
ported, that the people involved in them are 
doing great work. There is no intention here to 
dismantle them. But if there are opportunities 
to reorganize and reassign these programmes 
in ways that make them more interactive with 
what we are trying to do in this new centre, well, 
that seems like a good thing to consider. 


What do you say to critics who contend that 
you’re just rushing this through? 

Well, ’ma guy ina hurry. I will admit it. When 
I contemplate the urgency of finding treat- 
ments and cures for disease, it is hard to be 
comfortable with an argument that says: ‘go 
slow and take your time. The SMRB took a 
comprehensive look at the situation and con- 
cluded that the scientific opportunities are here 
now. Why would you want to delay? 


If this were easy, drug companies would not 

be struggling with languishing new-drug 
pipelines. What can the NIH do with this 
centre that the pharmaceutical giants aren’t 
already doing? 

It most certainly will not be easy. But there 
has been a recent deluge of discoveries about 
the molecular pathogenesis of disease. This 
has revealed hundreds of new potential drug 
targets. For rare and neglected diseases, 
economic considerations will limit private- 
sector interest; but NIH-funded researchers 
can explore the earlier stages in the drug- 
development pipeline to ‘de-risk projects that 
would otherwise lie untouched. Similarly, for 
common diseases, many of the new molecular 
discoveries are of uncertain value for drug 
development, but NIH investigators can 
validate these drug targets and develop prom- 
ising lead compounds, as well as carrying out 
process engineering on the pipeline itself. 
The goal will be to bring each project just far 
enough to become of interest to the private 
sector to pick up. m 
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PHARMACEUTICALS 


Slim spoils for obesity drugs 


Drug makers struggle to find viable treatments for global epidemic. 


BY HEIDI LEDFORD 


hen an obesity drug that he had 
helped to invent came up before a 
panel of US Food and Drug Admin- 


istration (FDA) advisers last week, physiologist 
Michael Cowley couldn't bear to watch. “Tt’s like 
watching your favourite team,” says Cowley, 
director of the Monash Obesity and Diabetes 
Institute in Victoria, Australia. “You worry that 
if you pay too much attention they'll lose” 

Many thought that Cowley’s drug, Con- 
trave, didn’t stand a chance. The same 
panel had already voted against two 
obesity drugs this year and said a third 
should be pulled off the market. 

But, defying expectations, the panel 
voted in favour of Contrave, making it the 
first obesity drug to win a recommenda- 
tion for approval in more than a decade. 
It was a rare taste of success for a field that 
has progressed so slowly that many have 
abandoned it altogether. “There is surpris- 
ingly little activity now given the poten- 
tial size of the market and the high unmet 
need,’ says Michael Hay, an analyst at the 
market-research firm Sagient Research 
Systems, based in San Diego, California. 

Effective obesity drugs have proven 
enormously difficult to develop. The 
brain circuits responsible for appetite over- 
lap with those that control other important 
functions, including mood, raising the risk of 
side effects. And obese patients would prob- 
ably have to take a drug for years, with testing 
involving large patient populations, driving up 
development costs. 

Most devastating to the field was the failure of 
drugs designed to block receptors in the brain 
that respond to appetite-stimulating chemicals 


Percentage change in share price 


called cannabinoids. Several major pharma- 
ceutical firms had pursued this angle, but gave 
up after the FDA's advisory panel voted down 
Paris-based Sanofi-aventis’s drug rimonabant in 
2007. The London-based European Medicines 
Agency had already approved rimonabant, but 
in 2008 it advised doctors not to prescribe it 
given the risk of suicidal tendencies. 

Hopes were running high for a turnaround, 
with three obesity drugs coming before the 
FDA this year. First up was a drug called 
Qnexa from Vivus, based in Mountain View, 


MARKET MOVEMENTS 


Only one of a trio of obesity drugs submitted to a Food 
and Drug Administration panel got the green light. 
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California. Qnexa is a combination of two drugs 
already on the market. One, phentermine, is a 
brain stimulant that suppresses appetite. The 
other, topiramate, is a treatment for epilepsy. 
On 15 July, FDA advisers said that the drug’s 
modest weight-loss benefit was not sufficient to 
counterbalance the risk ofits side effects, which 
include memory problems and birth defects. 
The next drug to go before the panel was 
lorcaserin, made by Arena Pharmaceuticals 


UP-AND-COMING: THE NEXT OBESITY DRUGS IN THE PIPELINE 


Drug Developer Target Clinical trial | Estimated Projected 
status approval date | revenue 
Victoza Novo Nordisk GLP-1 receptor Phase Il 2015 $780.4 million 
(liraglutide) 
Empatic Orexigen Dopamine and Phase IIb 2017 $443.0 million 
(zonisamide/ | Therapeutics noradrenaline 
bupropion) reuptake 
Symlin Takeda/Amylin | Amylin receptor, Phase IIb 2015 $898.0 million 
(pramlintide)/ leptin receptor 
metreleptin 
Velneperit Shionogi & Co. | Neuropeptide Y/ Phase IIb 2014 — 
peptide YY receptors 
Tesofensine NeuroSearch Dopamine, Phase IIb = = 
noradrenaline and 
serotonin reuptake 
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Contrave 


gets vote of 
approval. 


in San Diego. But on 16 September, the panel 
again voted no. The drug simply didn’t work 
well enough to make it worth the safety risks, 
says Abraham Thomas, an endocrinologist at 
the Henry Ford Hospital in Detroit, Michigan, 
who chaired the committee. 

By the time Contrave took the stage, market 
investors had become sceptical (see graph). 
Contrave, developed by Orexigen Therapeutics, 
based in La Jolla, California, and co-founded 
by Cowley, is also a blend of two approved 
drugs. One, bupropion, is an antidepressant 
that blocks the effects of the neuro- 
transmitter noradrenaline. The other, 
naltrexone, inhibits the effects of opi- 
oids on the brain and is used to treat 
alcoholism. Together, the two boost 
the activity of a brain circuit called 
the POMC pathway, which reduces 
hunger. A final decision on Contrave 
is expected early next year. 

But even if approved, Contrave 
will hardly spell the end of the obesity 
epidemic. In one recent study, those 
who took the drug lost only about 8% 
of their body weight over six months 
— little better than orlistat, the only 
over-the-counter obesity drug cur- 
rently available in the United States. 
Most US health-insurance plans don't 
cover weight-loss drugs, making a minimally 
effective drug even less appealing . 

A change of fortune may require a change of 
strategy, says Thomas Hughes, chief executive 
of Zafgen, an obesity drug company in Cam- 
bridge, Massachusetts. In the past, most compa- 
nies pursued drugs that would restrict appetite. 
Now, he says, they’re looking for new ideas, “but 
those ideas are few and far between’. 

The next obesity drugs likely to face the 
FDA target metabolism, says Hay (see table). 
Victoza (liraglutide), developed by Novo Nor- 
disk, based in Bagsvaerd, Denmark, to treat 
type 2 diabetes, mimics a gut hormone called 
‘glucagon-like peptide 1, which boosts insulin 
sensitivity and slows stomach emptying. 

Although more than a dozen other drugs are 
in early clinical trials, Thomas remains pessi- 
mistic. Many of them target pathways that have 
already been tried, he notes, or are reformula- 
tions of drugs that didn’t make it the first time 
around. “You can see what's in phase I clinical 
trials now and the answer is ‘nothing very excit- 
ing’,” agrees Stephen Bloom, an endocrinolo- 
gist at Imperial College London. “The obesity 
problem is unsolved, and looks like it’s going to 
stay that way for quite some time.” = 
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Modelling human liver function means integrating data at scales from the 
whole organ down to molecules, and decades to microseconds. 
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Germans cook up 
liver project 


Biologists join physicists in a bid to map the workings 
of the human organ at all scales. 


BY ALISON ABBOTT 


ystems biology — the holistic, interdis- 
ciplinary approach to modelling the 


processes of life — has set itself an ambi- 
tious new goal. 

Launched in Dresden last week, the Vir- 
tual Liver Network is a German collaboration 
between biologists and theoretical physicists to 
model the functioning human liver. The work 
could help to develop more effective medi- 
cines, for example, because the metabolism of 
drugs in the liver has a profound impact on 
their efficacy and toxicity; or help in the under- 
standing of liver disease. All foreign molecules 
are taken up into liver cells to be metabolized 
in preparation for excretion from the body. 

Although some models of molecular path- 
ways in liver cells can already predict howa drug 
may break down to become active or produce 
a toxic chemical, the biological consequences 
of this can only be predicted with a model of 
the liver’s entire interacting system of cells and 
tissues. Similarly, a model of the entire liver will 
reveal much more about liver disease than a 
model of molecular interactions alone will. 

The central challenge of the project lies in 
developing mathematical models that can unite 
data about processes that operate on vastly dif- 
ferent temporal and spatial scales (see ‘From 
macro to nano). If successful, the network will 
integrate models of subcellular molecular sig- 
nalling pathways with models of how a whole 


cell works, eventually building up a model of 
the entire organ that will be available to drug 
developers and other researchers. 

Success will also depend on meeting an 
equally tough sociological challenge — getting 
250 scientists in 69 research groups around Ger- 
many to work towards a common goal. “The 
network is very demanding and its principal 
investigators all have many different pressures 
to attend to,’ says Ursula Kummer, a modeller 
from the University of Heidelberg. “But we are 
all very excited about the challenge of multiscale 
modelling, and that’s what motivates us.” 

The German federal research ministry has 
provided €43 million (US$57 million) to sup- 
port the network for five years. It expands on 
the €36 million HepatoSys systems-biology 
programme, which between 2004 and 2009 
worked towards modelling the hepatocyte, 
the most abundant cell in the liver. 

The country’s research community was 
hostile to HepatoSys at first, resenting the 
programme's top-down orchestration by the 
government's ministry of research. Many 
scientists also initially scorned the ministry's 
decision to use freshly dissected hepatocytes, 
rather than an off-the-shelf cultured liver cell 
line that would have been 
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would reach a standard quality in all research 
labs, but scientists have come round to the 
benefits. “We've found in fact that cell lines in 
culture don't behave much like real liver cells) 
says molecular biologist Ursula Klingmiller 
from the German Cancer Research Center in 
Heidelberg. 

And Marino Zerial, a director at the Max 
Planck Institute of Molecular Cell Biology and 
Genetics in Dresden, points out that hepato- 
cytes offer “a great system for experimenters” 
because they naturally take up foreign mole- 
cules. Unlike most other cell types, they readily 
incorporate RNAi — RNA interference mol- 
ecules, which target individual genes in cells 
for suppression or silencing. 

HepatoSys involved 47 researchers, and 
some collaborating biologists and physi- 
cists formed intense bonds. Zerial and Yan- 
nis Kalaidzidis, his colleague at the institute, 
describe themselves as “almost inseparable”. 

Few papers have emerged from the six- 
year programme, but funders consider it a 
success. “We needed that amount of time to 
learn to speak each others’ languages and to 
work to a common goal,” says Gisela Miczka, 
who administers the Virtual Liver Network 
for the research ministry. “Plenty of papers 
from HepatoSys will start coming out in the 
next two years.” With other colleagues in Dres- 
den, Zerial and Kalaidzidis took five years to 
develop a model that describes how nutrients 
or signalling molecules are transported into 
the cell. They are only now preparing to submit 
a manuscript on the findings. 

The Virtual Liver Network will continue to 
generate data and develop spatio-temporal 
models of biological events on the cellular 
scale, but will also try to generate a new theo- 
retical framework for systems analysis at every 
scale of liver function. The multiscale model- 
ling will rely on standard tools including dif- 
ferential equations and stochastic statistical 
methods, says Kalaidzidis, but it will also 
require more elaborate mathematics to bridge 
data sets at different scales. “There is no gen- 
eral recipe for moving between scales,” he says, 
so the researchers must work out the essential 
features at each scale before applying them at 
the next level. 

This is not the first attempt to model a 
human organ. Physiologists around the world 
have been working on a computational model 
of the heart for a few years, and this effort set 
the stage for a European Union programme 
on modelling different organs — the Virtual 
Physiological Human Network of Excellence, 
launched in 2008 — on multiple scales. “But 
we havent yet really bridged the scales much,” 
says Peter Hunter, director of the Bioengineering 
Institute at the University of Auckland in New 
Zealand, who was a major driving force behind 
the programme. The large size of the Virtual 
Liver Network and the basic, single-scale work 
that has already been done, means the liver effort 
“is well-placed to make breakthroughs”. = 
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The rise of the 
genome bloggers 


Hobbyists add depth to ancestry trawls. 


BY EWEN CALLAWAY 


ours after Joseph Pickrell put his 
Her on the internet, an anony- 
mous blogger took the data and 
concluded that he came from Ashkenazi Jew- 
ish stock. Pickrell, a genetics graduate stu- 
dent at the University of Chicago, Illinois, 
was sceptical about the claim. But after talk- 
ing to relatives, he discovered that he had a 
Jewish great-grandfather who had moved to 
the United States from Poland at the turn of 
the nineteenth century. “It was a part of my 
ancestry I was totally unaware of, he says. 
The blogger, who writes under the pseu- 
donym Dienekes Pontikos at http://dodecad. 
blogspot.com, had commandeered Pickrell’s 
DNAas part of the Dodecad Ancestry Project, 
an ambitious project in which cutting-edge 
genomic analysis meets Web 2.0. Pontikos 
analyses genetic data submitted by followers 


MEET THE ANCESTORS 


of his blog to reconstruct personal ancestry 
and human population history — and reports 
his findings online. He is part of a small but 
growing group of ‘genome bloggers, a mix of 
professional scientists and hobbyists proving 
that widely available tools for computational 
biology could enable recreational bioinforma- 
ticians to make new discoveries. 

“They are not amateurs. They are far 
from being amateurs,” says Doron Behar, a 
population geneticist at Rambam Health Care 
Campus in Haifa, Israel, who studies human 
history. “I cannot stress enough the level of 
appreciation I have for their efforts.” 

Pontikos has so far analysed several hundred 
thousand single-letter DNA variations from 
more than 2,200 individuals. That includes 
more than 200 submitted to him by readers of 
his blog, who had had their genomes analysed by 
genetics testing firms such as 23AndMe, based 
in Mountain View, California, with the remain- 
der coming from publicly available datasets. The 


An analysis of data from the Dodecad Ancestry Project highlights genetic links between different ethnic groups. 
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readers volunteering their genomes (identities 
stay private) are mostly keen to delve into their 
own ancestry. But Pontikos, who is from Greece 
and describes himself as an “anthropology dilet- 
tante’, is more interested in unfurling the his- 
tory of populations that tend to be overlooked 
by human-population geneticists. For instance, 
his analysis of genomes from people living in 
northern Eurasia reveals a genetic connection 
between populations in northern Finland and 

central Siberia (see ‘Meet the ancestors’). 
David Wesolowski, a 31-year-old Austral- 
ian who runs the Eurogenes ancestry project 
(http://bgal101.blogspot.com), also focuses on 
understudied populations. “It’s a response, in 
a way, to the lack of formal work that’s been 
done in certain areas, so 


“ were doing it ourselves,” 

Posie ae he says. Wesolowski and 
They are far a colleague have drilled 

f into the population 

from being history of people liv- 
amateurs. ing in Iran and eastern 
Turkey who identify as 


descendants of ancient Assyrians, and who sent 
their DNA for analysis. Preliminary findings 
suggest their ancestors may have once mixed 
with local Jewish populations, and Wesolowski 
plans to submit these results to a peer-reviewed 
journal. 

But Pontikos sees little point in formally 
publishing his findings. “I can bypass them 
entirely, and have the entire world review what 
I write,’ he wrote in an e-mail. Indeed, com- 
ments on his blog — “could you please provide 
the eigenvalues for the principal component 
analysis’, for instance — read like the niggling 
recommendations of a manuscript reviewer. 

Pickrell notes that Dodecad and Eurogenes 
use cutting-edge techniques and open-source 
software developed by geneticists studying 
population history. The methods — which 
involve modelling past mixing between popu- 
lations and distilling vast quantities of genotype 
data — still stir debate in the peer-reviewed 
literature because they can be difficult to interpret 
unambiguously, says John Novembre, a popu- 
lation geneticist at the University of Califor- 
nia, Los Angeles. Behar, whose data on Jewish 
ancestry have been used by both projects, cau- 
tions that the techniques are more robust when 
applied to the history ofan ethnic group, rather 
than the ancestry of an individual. 

In response to concerns about the genetic 
privacy of those offering their genomes for 
analysis, “I don't think this is too worrisome,’ 
says Hank Greely, director of the Center for 
Law and the Biosciences at Stanford University 
in California. Both projects provide adequate 
privacy protection, he says, although they both 
could do a slightly better 


job at disclosing the risk NATURE.COM 
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learn more about their ancestry, the genetic 
and trait data needed for biomedical applica- 
tions are much harder, if not impossible, for 
amateurs to come by. Public repositories, such 
as the US National Institutes of Health's data- 
base of Genotypes and Phenotypes, tightly 
restrict access. 

One effort to change that is the Personal 
Genome Project, which is spearheaded by 
George Church, a geneticist at Harvard Medical 


School in Boston. The project aims to make 
the complete genome sequences and traits of 
100,000 people freely available to anyone, with 
no strings attached. So far it has enrolled 1,000 
participants and published near-complete 
genomes for 10 of them. Pickrell and 11 other 
scientists and genomics experts also added 
to the trove of freely available genomic data 
recently when they released their genetic data 
as part ofa project called Genomes Unzipped. 
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Church argues that better access to high- 
quality data could help this kind of informal 
bioinformatics to flourish, enabling computer- 
savvy people to make important contributions 
to genomics, just as they have with online busi- 
nesses such as Facebook. “It didn't take that 
much training to become a social-networking 
entrepreneur. You just had to be a good coder,” 
he says. With bioinformatics, “I think we’re in 
a similar position.” m 
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Developers call for handy lab aids 


Macmillan hopes to partner with scientists to turn software into commercial products. 


BY DECLAN BUTLER 


often have better software tools for manag- 

ing their music or family photos than they 
do for tracking their experiments and data. 
Major software companies tend to focus on 
much larger consumer and business markets, 
and offer little software for researchers. 

As a result, others are rushing to occupy 
the niche, among them Digital Science (www. 
digital-science.com), launched last week by 
Nature’s parent company, Macmillan Publish- 
ers. Digital Science's strategy is not only to 
develop its own products — independently 
and with partner companies — but also to tap 
into the innovation of researchers themselves. 
With few existing products available to satisfy 
their needs, a growing number of scientists 
have developed their own tools with which to 
better organize their research lives. 

The company has launched an open call for 
researchers who have written promising soft- 
ware to submit proposals for turning it into a 
commercial product. Researchers understand 
their colleagues’ needs best, but often “don’t 
have resources to turn their software into a 
polished product’, says Timo Hannay, Digital 
Science's managing director and former pub- 
lishing director of nature.com. The company 
aims to partner with researchers, or their start- 
ups, to provide them with the financial, devel- 
oper and business resources they need. 

The goal is to offer researchers tools that are 
as intuitive and user-friendly as well-designed 
consumer software. The company will initially 
focus on text-mining software, metrics-based 
tools to help institutions and funders better 
assess the performance of their funding and 
researchers, and lab-management software to 
help to keep tabs on anything from experi- 
ments to reagents. 

There is certainly a massive need for bet- 
ter software to increase productivity at every 
stage of the complex scientific workflow, 
says Alexander Griekspoor, who founded 


IE an unfortunate truism that scientists 


Got any nifty apps for your lab? 


the company Mekentosj, based in Aalsmeer, 
the Netherlands, which produces software 
for molecular-biology applications. Sriram 
Kosuri, a bioengineer at the Wyss Institute for 
Biologically Inspired Engineering at Harvard 
University in Boston, Massachusetts, agrees. 
“Every bench scientist I know has their own, 
and self-admitted non-optimal way to organize 
their information,’ he says. Kosuri is a founder 
of OpenWetWare (openwetware.org), a wiki 
for sharing lab protocols and data between 
biology groups worldwide. “I think there is an 
increasing willingness to pay for useful tools,” 
he says. 

Scientists struggling to organize thousands 
of article PDFs strewn across their hard drives 
have already embraced services such as Griek- 
spoor’s Papers (mekentosj.com/papers), and 
London-based Mendeley (www.mendeley. 
com), which bring the simplicity of ‘Tunes to 
managing papers. Mendeley also offers social- 
networking facilities and other features. 

Publishing giant Elsevier has recently 
entered the research-services market with its 
SciVal suite of metrics tools. It is one of several 
publishers that view providing institutions 


with performance-measuring applications 
as a strategic move, says David Bousfield, 
the London-based vice-president and lead 
analyst at Outsell, a publishing and informa- 
tion consultancy. Elsevier has also launched 
SciVerse, a platform for searching and shar- 
ing content from Elsevier’s own databases and 
the web. The product provides programming 
interfaces that allow researchers to build their 
own customized applications on top of content 
from Elsevier and other data sources, such as 
government databases. 

Kosuri thinks that Digital Science's strategy 
of refining software created by researchers 
makes sense. “Getting the most important sets 
of tools developed more extensively would be 
tremendous,” he says. But Michael Eisen, a 
geneticist at the University of California, Ber- 
keley, and co-founder of the Public Library of 
Science, notes that many researchers have lit- 
tle enthusiasm for devoting time to perfecting 
software they have written. “I expect few would 
be interested,” he says. 

“We're not trying to convince anyone to 
create a commercial product if the desire 
isn’t already there,” Hannay counters. “We're 
trying to tap into the small but significant 
proportion of researchers who have identified 
an unmet need and are trying to do something 
about it” 

Freely available code isn't an ideal alterna- 
tive — most open-source software for scien- 
tists “sucks”, says Eisen, although “a lot of it is 
really good and essential to what we do”. But 
many lab-management tasks can be done using 
generic consumer open-source tools, he says. 
“The people I know mostly use wikis to keep 
track of stuff in the lab: they’re free, flexible, 
and easy to set up and use.” 

Eisen also warns against thinking of software 
as a panacea. “Everyone has aspirations to be 
better organized in the lab,” he says, “they think 
there’s a magic piece of software out there that 
will solve all their problems for them, but then 
they realize the problem is really that they’re 
disorganized. m 
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Venus miss 1s a setback for 
Japanese programme 


Akatsuki mission on hold for six years before next attempt to approach planet. 


Top: the craft was set to monitor Venus’s atmosphere. Bottom: images taken as Akatsuki sped away. 


BY DAVID CYRANOSKI 


a planetary scientist as a multi-million- 

dollar spacecraft going silent while 
executing a crucial manoeuvre. Loss of signal 
at such times usually spells disaster, and the 
spacecraft may never be heard from again. 

Researchers and engineers working with 
Japan’s Akatsuki spacecraft were spared that 
worst-case scenario on 6 December. Although 
Akatsuki failed to make contact for more than 
an hour after the scheduled engine burn that 
was to place it in orbit around Venus, it did 
eventually call home. But the news was not 
promising. Not only had Akatsuki been tum- 
bling out of control for a period of time, it had 
failed to enter orbit. It will now have to circle 
the Sun for six years before it gets a second 
chance. 

The failure derails an ambitious programme 
of research into Venus’s atmosphere, and marks 
the third time that the Japan Aerospace Explo- 
ration Agency (JAXA) has battled mechanical 
problems ona mission to another Solar System 
body. In 1998, a faulty valve caused a loss of fuel 
on JAXA’s Nozomi spacecraft, which ultimately 


| es events can be as gut-wrenching for 


prevented it from orbiting Mars. And the Haya- 
busa probe, which returned a minute quantity 
of asteroidal material to Earth this year, experi- 
enced a variety of near-fatal problems. 

Ata press conference on 10 December, offi- 
cials reported that Akatsuki’s engines fired for 
less than 3 minutes, far short of the 9 minutes 
and 20 seconds required to slip into orbit. “We 
are trying our best to get rid of any precon- 
ceived notions and figure out what happened,” 
a project team member told Nature. 

Akatsuki was to scour Venus with an infrared 
camera for evidence of volcanic activity, study 
lightning in the atmosphere and investigate the 
dense cloud layer that hides the planet’s surface 
from view. Its planned equatorial orbit — timed 
to match the ‘super rotatiom of Venus’s atmos- 
phere, which spins about 60 times faster than 
the planet beneath it — would have allowed it 
to follow the long-term evolution of features in 
the cloud layer. Such data would have comple- 
mented the global coverage of Venus Express, 
the European Space Agency (ESA) probe that 
has been orbiting Venus since 2006. 

“This is very disappointing for all of us,” 
says Hakan Svedhem, an ESA project scien- 
tist for Venus Express. “We had hoped to do 
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many things jointly.” 

Engineers will now pore over telemetry data 
from Akatsuki and conduct tests with backup 
hardware on Earth to try to identify the source 
of the failure. The spacecraft’s fuel system is 
likely to get close scrutiny: Akatsuki uses the 
same two-fluid hydrazine-nitrogen tetroxide 
thruster as Nozomi, although the valve issue 
has been addressed. 

A series of images taken as Akatsuki sped 
away from Venus shows that its cameras are 
working well and may yet be put to use if the 
spacecraft survives its unplanned detour. One 
hurdle faced by Nozomi on its second attempt 
to enter orbit — frozen fuel — will probably not 
affect Akatsuki, because of the probe's proxim- 
ity to the sun. Solar radiation might, however, 
take a toll on the craft's sensitive instruments. 

If Akatsuki does reach Venus in 2016, it might 
still be able to join forces with other probes. Last 
month, ESA agreed 
to extend the Venus 
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to the craft’s orbit to 
save fuel. Meanwhile, NASA is planning SAGE, 
a Venus lander that could launch in 2016. 
“Understanding Venus is important because 
it informs us about the evolution of the climate 
on Earth,” says Sanjay Limaye, an atmospheric 
scientist at the University of Wisconsin-Mad- 
ison and a co-investigator on Akatsuki. “Not 
going into orbit now does not translate into a 
diminished interest in Venus, as the questions 
do not go away,’ he says. = 


CORRECTION 

The News story ‘Self-plagiarism case 
prompts calls for agencies to tighten 

rules’ (Nature 468, 745; 2010) stated 

that Reginald Smith had escaped censure 
for research misconduct for publishing 
duplicate papers. In fact, Smith was formally 
reprimanded for reuse of published 
materials and data in multiple publications, 
although separate allegations of data 
falsification and plagiarism were not upheld. 
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To learn the chemical language of plants, Ian Baldwin has built 
up a German research empire that engineers seeds — anda field 
station in the Utah wilderness to grow them. 


nlate spring 1988, Ian Baldwin was driving 
through the dessicating heat of the Utah 
desert in his rickety old VW microbus. The 
young researcher, from the State University 
of New York (SUNY), Buffalo, was searching for 
a native species of the tobacco plant as well as 
a place to sleep for the night. When he pulled 
up at the Desert Inn Ranch, he encountered a 
different form of wildlife. A posse of ferocious 
dogs flew out of the gate, puncturing his car 
tyres with their teeth. Behind them was rancher 
Herb Fletcher, cradling a submachine gun. 
Baldwin was terrified. But when Fletcher 
called the dogs off, Baldwin slipped, very 
cautiously, out of the bus. Fletcher smiled — 
“He had a wonderful smile; recalls Baldwin 
— and invited him in. The scientist and the 
old rancher quickly bonded over their shared 
interest in natural history. It was the start of a 
firm friendship — and the opening of a new 
era in Baldwin’s research life, one that has 
helped propel him into a dominant position in 
the burgeoning field of chemical ecology, the 
study of the chemical signals between plants 
and other organisms in the environment. 
Rooted and unable to flee, plants have 
evolved many ingenious ways of repulsing 
their enemies, from generating noxious chemi- 
cals in their leaves to emitting complex, volatile 
bouquets to attract predators that will pick off 
the plant’s attackers’. It is a highly sophisticated 
chemical language undetectable by the human 
nose and largely undeciphered by science. But 
if and when it can be understood, it might 


BY ALISON ABBOTT 


open the way to modifying plants’ signals to 
give them stronger protection, or to develop- 
ing environmentally friendly mimics of natural 
signals as alternatives to herbicides. 

In his efforts to understand this language, 
Baldwin has embarked on a project unique in 
its ambition and scale, carried out along what 
he calls “the longest lab corridor in the world”. 
Working in Jena, Germany, where he is a direc- 
tor of the Max Planck Institute for Chemical 
Ecology, he and his team develop powerful 
genetic tools to systematically knock out, or 
knock down, genes involved in making the 
chemical signals. Then they observe the effects 
by growing the modified plants in the wild — 
8,844 kilometres away, next to the Utah ranch. 
The fastest journey from Jena to the field sta- 
tion takes 27 hours. The researchers have little 
choice, however. In Germany, with its populist 
aversion to anything genetically modified, such 
trials cannot proceed. 


COCKTAILS AND CRAZINESS 

High-profile papers roll out of Baldwin's insti- 
tute with regularity. One published in Science in 
August’ showed that the plants, when nibbled 
by herbivorous insects, can change the ratio of 
isomers of some of their signalling molecules 
specifically to attract predators of the leaf- 
eaters. And although Baldwin deliberately keeps 
his distance from applications, the agricultural 
industry studies his results attentively. It wants 
to learn how plants, or mixtures of plants, might 
be persuaded to produce the best cocktails of 
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volatile emissions for their own defences. 

“Tan Baldwin is like a madman,” says Ted 
Turlings, a plant scientist at the University of 
Neuchatel in Switzerland, with some awe. “He 
doesn’t stop working, day and night, and he lets 
nothing get in his way. He sets up a field station 
in an area where no one would think of going, 
builds himself a complete molecular tool set for 
his tobacco species, and sets out with purpose to 
get permission to use transgenic plants.’ 

For Baldwin, though, the approach is the only 
way to learn howa particular plant has evolved 
to survive in the real, stressful world of harsh 
weather and hungry insects. “It seems to me it 
would be madder not to do it this way,’ he says. 

With his humorous and low-key manner, 
and customary jeans, plaid shirt and baseball 
cap, Baldwin, 52, shows no sign of frenzy. The 
son of academic historians in Baltimore, Mary- 
land, he decided early on that the ivory tower 
was not for him. “My parents are medievalists 
and live in the eleventh century,” he says. “I 
wanted to know what the rest of the world did 
for a living” During high school and college 
he worked in his spare time as a fish cleaner, 
landscaper, truck driver, an auto and tractor 
mechanic, a logger and tree climber, and even 
a maple-sugar producer. And some of these 
skills became useful in unexpected ways. 

As an undergraduate majoring in chemistry 
and biology at Dartmouth College in Hanover, 
New Hampshire, he became an informal assist- 
ant to Jack Schultz, one of the earliest pioneers 
of chemical ecology. Schultz, who was studying 
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lan Baldwin looks after Nicotiana attenuata plants at one end of the world’s longest lab corridor, his greenhouse in Jena, Germany. 


forest canopies but couldn't stand heights him- 
self, was as attracted to Baldwin's tree-climbing 
skills as to his precocious ability in the chemistry 
lab. In a joint experiment published in Science 
in 1983°, the pair claimed that chemicals from 
leaves that had been ripped to mimic insect 
damage could travel through the air to neigh- 
bouring plants and change their biochemistry in 
a way that wards off further insect attack. Their 
‘talking trees’ notion was dismissed by many 
plant scientists as a fanciful over-interpretation 
of results. 


BALDWIN’S TREE-CLIMBING 
SKILLS IMPRESSED AS MUCH 
AS HIS PRECOCIOUS ABILITY 
IN THE CHEMISTRY LAB. 


Burned by the reaction, Baldwin decided to 
play it safe for his PhD studies at Cornell Univer- 
sity in Ithaca, New York, and instead researched 
the more mainstream internal signalling path- 
ways within plants. The favoured plant model in 
the Cornell lab was Nicotiana sylvestris, a species 
of tobacco native to Peru. But when it came to 
extending this work into the field, Baldwin, by 
now on the tenure track at SUNY, decided to 
switch to a similar species native to the United 
States. His hunt for Nicotiana attenuata in Utah 
led to the fateful encounter with Fletcher. 


Baldwin and his family spent the next seven 
summers with Fletcher, who regaled them 
with stories about the area’s violent recent his- 
tory. Fletcher had been brandishing a weapon 
at his first meeting with Baldwin because a 
neighbouring family of polygamists had been 
attacking him and his land. A mobster friend 
eventually put a stop to it. 

If, during those summers, Baldwin learnt new 
things about how humans defend themselves, he 
learnt even more about plant defences. Fletcher 
would drive Baldwin around the 1,300-square- 
kilometre property to point out small clumps of 
N. attenuata he had spotted while out ranching, 
and Baldwin would use them in experiments to 
find out, for example, how the plants activated 
chemical defences against herbivores. 


THE HUNT FOR HOTSPOTS 

After a brush fire in 1992, Baldwin discovered 
that the seeds of N. attenuata germinate only 
when activated by components of wood smoke 
penetrating the soil around seeds. Then they 
suddenly flourish in the temporarily nutrient- 
rich, herbivore-free, post-fire environment. He 
learnt to locate natural populations of N. attenu- 
ata more efficiently by chasing lightning strikes 
— “or simply phoning the fire department and 
asking them where they spent money”. One of 
his earliest series of experiments was designed 
to understand the costs and benefits to the plant 
of one particular defence mechanism — pro- 
ducing nicotine in its roots and then pumping 
the toxin up into its leaves — in the field. 


Plant biology had been transformed in 1990 
with the discovery that the hormone jasmonic 
acid could induce volatile signalling. Baldwin 
immediately set out to see if it could also induce 
nicotine production, and found that it could. So 
he synthesized an artificial version of the hor- 
mone that he could inject into the soil around 
plant roots. His complex set of experiments 
involved four populations of N. attenuata, 
each with more than 1,000 plants, spread over 
a 100-kilometre loop. He found that among 
plants pretreated with jasmonate to artificially 
induce a level of nicotine defence, those that 
were not attacked produced less seed than 
those that subsequently were attacked, but not 
ravaged, by herbivores’. With this work, he 
proved a point about plant defence that had 
previously only been assumed — that using 
defences only when needed is an evolutionary 
advantage, because it maximizes benefits and 
minimizes cost. 

More than a decade after his ill-received 
‘talking trees’ Science paper, Baldwin was itch- 
ing to return to the theme. The work had been 
vindicated by the early 1990s when several 
scientific groups working on crop plants had 
established that such plant volatiles not only 
exist, but stimulate responses from other spe- 
cies — pathogens, herbivores, herbivore-eating 
carnivores, and perhaps other plants as well. 
But most of the work had been done under 
laboratory conditions. 

Baldwin wanted to understand plant biol- 
ogy in the real world. He particularly wanted to 
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explore the volatile chemicals that plants exude, 
using genetic manipulation to take apart the 
machinery involved in making them. He was 
convinced that the desert-dwelling N. attenu- 
ata would make an ideal model because it has 
evolved such an array of mechanisms to sur- 
vive severe environmental stresses including 
fire, herbivores and drought. 

What wasn’t yet available for N. attenuata, 
however, was the toolkit necessary for genetic 
engineering. One was already being assembled 
for Arabidopsis — the favoured study subject of 
lab-based plant biologists, but one that Baldwin 
regarded as “a boring weed of no use to eco- 
logical evolution research”. He wanted to have 
the same for his N. attenuata — but he knew it 
would be expensive and take many years. 

That didn't discourage the Max Planck Soci- 
ety in Munich, which recruited him in 1995. 
Following the reunification of Germany in 1990, 
the society was obliged to extend its network of 
research institutes into former East Germany, 
and it took the opportunity to add exciting areas 
of research and recruit more foreign directors. 
The society wholly bought into Baldwins vision, 
and made him a founding director of the Max 
Planck Institute for Chemical Ecology in Jena. 
All Max Planck directors are given generous, 
guaranteed funding and time to develop long- 
term projects without having to apply for grants, 
making Baldwin's dream possible. 

Baldwin now has a formidable institute 
employing more than 50 researchers, students 
and technicians to develop the chemical, as 
well as genetic tools to mimic or block signal- 
ling pathways. 

Not even the might of the Max Planck Soci- 
ety could help him realize one part of his origi- 
nal vision — to stage field trials of genetically 


SUNGLASSES AND iPODS ARE 
FORBIDDEN AT THE FIELD 
STATION TO ENSURE PEOPLE 
AVOID THE RATTLESNAKES. 


modified plants native to Germany. “Even if 
you do get approval, the German government 
now requires the GPS positioning of all field 
trials with transformed plants to be posted on 
the web — so every single trial is destroyed by 
activists,” he says. 

In the United States, the work is arduous 
but doable. The team in Jena engineers seeds 
and sends them to the US Animal and Plant 
Health Inspection Service in Rockville, Mary- 
land, where they are inspected and sent on to 
the field station in Utah. This season, Baldwin's 
team planted out 4,000 seedlings for around 36 
studies on topics ranging from plant—pollinator 
interactions to how plants allow their roots to 
be colonised by microbes. “This is really a lot of 
backbreaking work,’ says Baldwin. 
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Life at the field station is tough in other 
ways. Brush fires spread “faster than you can 
run, says Baldwin. In 2005, a fire spotted on 
the horizon sent a dozen or so scientists run- 
ning for their lives. Aware that their station has 
an explosive tank of propane fuel, they tore 
away in vans. (The fire shifted direction before 
reaching the station.) Baldwin admits to being 
“tyrannical” about safety. In a region shared 
with deadly animals such as the sidewinder 
rattlesnake, he forbids iPods and sunglasses — 
“people have to be able to hear and see snakes” 
— and no one is allowed to wander around the 
desert alone. It is an hour’s rough drive to the 
nearest hospital. Everyone must learn how to 
change a tyre on the vans. 


LIFE IN THE WILD 

Nature can add to their troubles in other ways. 
This year, for example, an elaborate series 
of experiments designed to study how the 
empoasca leaf hopper, a herbivore, recognizes 
and interacts with its host plant went to waste 
because the leaf hopper didn’t show up. 

But results continue to flow. One of Baldwin's 
most colourful papers, published in February 
this year’, probes the dilemma of plants that 
need to attract pollinators while remaining 
inconspicuous to herbivores. Nicotiana attenu- 
ata normally flowers at night, emitting the 
volatile benzyl acetone to attract hawk-moths. 
Unfortunately, hawk-moth larvae are also her- 
bivores, and the moths often leave their eggs on 
the leaves as they pollinate. When the plants 
become infested, the team found, they shut 
down production of benzyl acetone and open 
their flowers at dawn when the moths are gone. 
They are then pollinated by hummingbirds. 
Using a series of genetically modified strains, 
Baldwin’s team showed how the oral secretions 


2010 
© 2010 Macmillan Publishers Limited. All rights reserved 


In Utah, 8,844 kilometres from their German lab, Baldwin and his team study genetically modified plants. 


from the munching hawk-moth larvae trigger 
the dramatic switch in flowering time. 

Baldwin's research has inspired attempts to 
develop new crop strains that could be practical 
for farmers in poorer countries who cant afford 
lots of pesticides and herbicides, says John 
Pickett, director of Rothamsted Research in 
Harpenden, UK, a historic agricultural research 
centre. The centre is working with some agri- 
cultural companies on field trials in the United 
States and United Kingdom of crops genetically 
modified to amplify chemical signals that plants 
make when they are under attack. Details are 
currently confidential, he says, but “we'll have 
big announcements in the next year or two’. 

Back in Jena, at the end of the 2010 season, 
Baldwin is planning next year’s experiments 
with genetically altered strains designed to 
have elevated or suppressed emissions of vola- 
tile signals, which he labels as the ‘screamers’ 
or the ‘mute’ He will investigate how single 
screamers planted among a colony of mutes, 
for example, might affect herbivore or preda- 
tor behaviour. 

Right now, Baldwin is 8,844 kilometres away 
from the next experiment at his barren, hostile 
field site. Human relations in the region are 
now a lot tamer. Relations between plants and 
insects, though, are as wild as ever. m 


Alison Abbott is Natures senior European 
correspondent. 
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TECHNOLOGY Lessons for future 
of the Internet in history of 
communications p.892 


DRUGS Opium dominates 
show on legal and 
illegal highs p.896 


ECONOMICS Beware, 
politicians will exploit 
any indicator p.897 


OBITUARY Allan Sandage, who 
measured the Universe’s 
expansion, remembered p.898 


Build life to understand it 


Biologists and engineers should work together: synthetic biology reveals how 
organisms develop and function, argue Michael Elowitz and Wendell A. Lim. 


his year’s publicity about Craig Venter 
‘creating’ life', and this week’s report 
on the promise and perils of synthetic 
biology from US President Barack Obama's 
commission on bioethics, threaten to obscure 
the most important impact of this field. Syn- 
thetic biology is redefining the discipline of 
biology and helping people reach a deeper 
understanding of how life works. 
Conventionally, biologists have sought 
to understand life as it exists. Increasingly, 
however, from stem-cell reprogramming’ to 
microbial factories’, researchers are describ- 
ing what is and exploring what could be. An 
analogous shift occurred in physics and 


chemistry, especially in the nineteenth cen- 
tury. Like biology, these fields once focused 
on explaining observed natural processes 
or material, such as planetary motion or 
‘organic’ molecules. Now they study physi- 
cal and chemical principles that govern what 
can or cannot be, in natural and artificial sys- 
tems, such as semiconductors and synthetic 
organic molecules*. 

The expansion of biology froma discipline 
that focuses on natural organisms to one that 
includes potential organisms (see ‘Beyond 
the natural’) will have three long-term effects. 
First, it will enlarge the community of biolo- 
gists to include researchers with different 


assumptions and goals, such as engineers. 
Second, it will alter the way in which scien- 
tists address the fundamental problem of how 
biological systems work. Integrating reverse 
and forward engineering approaches will free 
biologists to uncover fundamental principles 
that explain, unify and extrapolate beyond 
mechanisms observed in specific model sys- 
tems. Third, it will provide a new conceptual 
basis for teaching biology — one founded on 
stimulating inquiry from students as to how 
biological components and modules could be 
used to implement complex functions. 
Although traditional disciplinary bounda- 
ries are dissolving, the cultural differences > 
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> between scientists and engineers remain 
strong. For biologists, genetic modification is 
a tool to understand natural systems, not an 
end in itself. Thus, making biological systems 
‘engineerable’ — a goal of engineers in the 
field of synthetic biology — can seem point- 
less. Many biologists wonder why engineers 
fail to appreciate the intricate, beautiful and 
sophisticated designs that occur naturally. 
Engineers are often equally perplexed by 
biologists. Why are they so obsessed about the 
details of one particular system? Why don't 
they appreciate the value of replacing a com- 
plex and idiosyncratic system with a simpler, 
more modular and more predictable alterna- 
tive? These misunderstandings can make for 
fascinating conversations, but they can also 
prevent mutually beneficial synergies. 

Biologists and engineers need to appreci- 
ate the complementarity of their approaches. 
Below the surface, these two communities 
have common interests and goals that can, 
and must, be addressed from both directions 
— forward and reverse engineering. 


SAME CHALLENGES 

Traditional biologists seek to reverse 
engineer natural biological systems — to 
understand how their molecular circuitry, 
composed of interacting genes and proteins, 
gives rise to observed behaviour. Synthetic 
biologists seem to do the opposite. They 
forward engineer new behaviour using 
well-understood genetic components and as 
simple a design as possible. Both communi- 
ties face the same daunting challenge: how 
to relate the architecture of a gene circuit to 
its behaviour in a cell or tissue. 

Synthetic circuits can provide insights into 
natural circuit-design principles that would be 
difficult or impossible to obtain using conven- 
tional perturbations of natural systems alone. 
Consider signalling. Biologists have discov- 
ered that a handful of canonical pathways 
are used repeatedly across species, tissues 
and stages of development. What is it about 
this set of pathways that makes it sufficient 
for the development and physiological func- 
tion of a complex organism? To address this 
question requires an understanding of what 
each pathway can do. Synthetic biologists can 
systematically engineer a diverse range of 
signalling-pathway architectures and analyse 
them in relative isolation from any particular 
set of downstream processes”*. These archi- 
tectures may include natural, as well as new, 
configurations. The results could provide a 
higher-level view of signalling in which one 
could associate each pathway and architecture 
with a specific functional repertoire, instead 
of thinking about them primarily in terms of 
their molecular interactions. 

A second example of where synthetic biol- 
ogy can provide a complementary approach 
is metabolic networks — one of the most 
active frontiers in the field. Biology has 


conventionally focused on understanding 
the metabolic pathways in particular organ- 
isms. Synthetic biology enables researchers 
to consider what types of metabolic net- 
works are possible by combining enzymes 
from all species. Such work has focused on 
engineering novel metabolic pathways that 
produce specific molecules for medicine and 
industry. These efforts can also address fun- 
damental biological questions. For example, 
what trade-offs exist between metabolic effi- 
ciency and flexibility? Are there fundamental 
principles for how cells set up their meta- 
bolic economy and synthesize and distribute 
key chemical precursors’*? These questions 
could be important for understanding the 
diversity of metabolic networks in natural 
microbes as well as in biomedically impor- 
tant systems such as cancer, in which cells 
alter their metabolism’. 


BEYOND THE NATURAL 


Synthetic biology extends the study of biological 
systems beyond those that exist. 


Evolutionary 
exploration 


Potential 
organisms 


organisms 


Synthetic biology 
exploration 


Synthetic-biology approaches may also 
provide insights in developmental biology. 
They could be used to tackle fundamental 
questions of what types of multicellular pat- 
terning processes are possible, and what 
types of circuits — combining signalling, 
regulation, differentiation and morphologi- 
cal change — would be sufficient to program 
the formation of organisms. 

Using well-characterized signalling path- 
ways, transcription factors and regulators 
of cell morphology and division, it should 
become possible to explore a range of natu- 
ral and non-natural developmental circuit 
architectures. This would start with very 
simple patterns that could be generated in 
relative isolation from other developmental 
processes in the simplest systems. Eventu- 
ally, synthetic developmental systems should 
yield a deeper understanding of morpho- 
logical programming, provide insights into 
natural developmental systems and possibly 
enable applications in tissue engineering. 

The convergence of engineering and 
biology could bring exciting new ways of 
teaching biology. Conventional biology, 
focused on understanding the structure, 
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mechanism and origins of extant beings, 
tends to involve memorizing nomenclature 
and facts. In some cases, this approach can 
obscure unifying principles and concepts. 

Instead, teachers could start by challenging 
students with the question: ‘how might you 
build a biological system that performs a 
particular function?’ Students could be 
asked to deduce underlying design princi- 
ples — for example, to identify the general 
types of circuit modules necessary or suf- 
ficient to implement a given behaviour in 
cells. Students thus equipped with organ- 
izing concepts could better navigate the sea 
of confusing nomenclature in natural living 
systems. Inconsistencies between ideal- 
ized designs and actual examples would 
raise important questions about assumed 
functions, and about constraints inherent 
to the evolutionary process. In physics 
and engineering, this kind of approach 
is commonplace and can be effective at 
engaging and motivating students. 

Such concepts could be introduced to teen- 
age students who are just starting to think 
more deeply about the mechanisms under- 
lying plants and animals. Requiring theory, 
computation and experiment would better 
equip students for multidisciplinary research. 
It would also expose them, at an earlier stage, 
to the conceptual and creative aspects of the 
scientific process, potentially attracting a 
broader range of people to biology. 

Many technical and fundamental obstacles 
remain before the design and construction 
of synthetic biological systems can become 
routine. And as discussed in the commis- 
sion report, the societal challenges may 
be equally formidable. Bringing together 
the energies and expertise of diverse 
communities that think about biological 
problems in different terms is a good first 
step towards taking full advantage of the 
many opportunities that lie ahead. = 
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The polar bear (possible hybrid pictured) is one of several species vulnerable to hybridization. 


The Arctic melting pot 


Hybridization in polar species could hit biodiversity hard, 
say Brendan Kelly, Andrew Whiteley and David Tallmon. 


brown fur was shot by hunters in the 

Arctic. DNA tests confirmed what many 
suspected — it was a hybrid of a polar bear 
and a grizzly. A media frenzy quoted biolo- 
gists as saying that although they knew in 
theory such cross-breeding could happen, 
they didn’t expect to see it in the wild. In 
2010, another hybrid was killed bya hunter in 
the western Canadian Arctic. This time, the 
animal was a second-generation cross — its 
mother was a hybrid and its father a grizzly. 
More cases are probably out there. 

Biologists should not be surprised. There 
have been hints of Arctic hybrids before. In the 
late 1980s, a whale thought to be a narwhal- 
beluga mix was found in west Greenland. 
In 2009, an apparent bowhead-right-whale 
hybrid was photographed in the Bering Sea, 
between Alaska and Russia. Dall’s porpoises 
are known to be mating with harbour por- 
poises off the coast of British Columbia, and 
seal hybrids have been identified in museum 
specimens and in the wild. 

These are just the first of many hybridiza- 
tions that will threaten polar biodiversity. 
Rapidly melting Arctic sea ice imperils 
species through interbreeding as well as 
through habitat loss. As more isolated popu- 
lations and species come into contact, they 
will mate, hybrids will form and rare species 
are likely to go extinct. As the genomes of 
species become mixed, adaptive gene com- 
binations will be lost. 

Researchers have little idea how much 
hybridization is occurring, let alone how 
it will affect populations. Plans must be 


IE 2006, a white bear with patches of 


developed immediately to monitor the genet- 
ics of Arctic animals and to deal with hybrids 
before currently discrete populations merge 
and at-risk species are bred out of existence. 

We have counted at least 34 possible 
hybridizations between discrete populations, 
species and genera of Arctic and near-Arctic 
marine mammals (see Supplementary Infor- 
mation). Of the 22 species involved, 14 are 
listed — or are candidates for listing — as 
endangered, threatened or of special con- 
cern by one or more nations. Twelve cases 
are of hybridization between different spe- 
cies — half involving crosses between what 
are normally classified as distinct genera. 
Twenty-two cases involve isolated popula- 
tions at risk of intra-species mixing, nine of 
which are classified as distinct subspecies. 

The Arctic Ocean is predicted to be 
ice-free in summer before the end of the 
century, removing a continent-sized barrier 
to interbreeding. Polar bears are spending 
more time in the same areas as grizzlies; seals 
and whales currently isolated by sea ice will 
soon be likely to share the same waters. 

Not all cross-species matings will produce 
viable — or indeed any — offspring. The 
chance is enhanced in Arctic marine mammals, 
because their number of chromosomes has 
changed little over time. There is evidence of 
hybridization across species (such as between 
spotted and harbour seals) as well as across 
genera (such as harp and hooded seals). 

Hybridization is not necessarily a bad thing. 
It has been an important source of evolution- 
ary novelty. For example, a new species of 
chub originated in the Colorado River before 


the presence of humans, from the hybridiza- 
tion of two other species. But hybridization 
driven by human activities tends to occur 
quickly and to reduce genomic and species 
diversity. When mallard ducks were intro- 
duced to New Zealand in the 1860s, they 
began mating with native grey ducks. Now 
few, if any, pure native populations remain. 

Diversity loss may be minor if, say, North 
Pacific and North Atlantic minke whale 
subspecies interbreed in an Arctic with 
diminished ice. Other crosses will be more 
problematic. Interbreeding between the 
North Pacific right whale, of which there 
are probably fewer than 200, and the more 
numerous bowhead whale could quickly 
push the former to extinction. If polar bears 
survive climate change in secluded refuges 
— which is far from certain — interbreeding 
could be the final straw. 

Cross-breeding might affect social 
and ecological interactions. The appar- 
ent narwhal-beluga hybrid discovered in 
Greenland had teeth combining qualities of 
each species, but lacked the narwhal’s tusk — 
an important determinant of narwhal breed- 
ing success. Polar-grizzly hybrid bears in a 
German zoo exhibited behaviour associated 
with seal hunting, but not the strong swim- 
ming abilities of polar bears. First-generation 
crosses can have ‘hybrid vigour, but later 
generations are likely to be less fit than their 
ancestors (‘outbreeding depression). 

The International Union for the Con- 
servation of Nature should develop a com- 
prehensive policy for managing hybrids, 
including determining when it is practical 
to prevent or limit hybridizations. Red wolf 
and coyote hybrids, for example, have been 
culled in the United States in the past decade 
to help preserve distinct species. 

Researchers should combine models of 
sea-ice loss, oceanography and landscape 
genomics to predict when and where hybrid- 
ization is most likely, and to monitor the 
genetics of at-risk populations. National and 
tribal governments should work together: 
some indigenous groups actively monitor 
the harvests of Arctic marine mammals, 
and they could collect genetic samples in 
remote areas. The rapid disappearance of 
sea ice leaves little time to lose. m 
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Further reading and Supplementary Information 
accompanies this article at go.nature.com/h4bksj 


16 DECEMBER 2010 | VOL 468 | NATURE | 891 


© 2010 Macmillan Publishers Limited. All rights reserved 


AKG-IMAGES 


| COMMENT | BOOKS & ARTS 


Thomas Edison invented and patented many early film technologies, allowing him to control the industry. 


Fighting 


monopolies 


A history of communications technologies holds 
lessons for the Internet today, finds LiGong. 


always mesh. Thomas Edison, the inven- 

tor of the light bulb and the phonograph, 
almost suffocated the US film industry in the 
early 1900s by controlling all the crucial pat- 
ents for film technology. His Motion Picture 
Patents Company (also called the Edison 
Trust) dictated film length, style, content, 
who could show films and at what price. 

In 1934, New Jersey’s Bell Laboratories, the 
birthplace of the semiconductor, suppressed 
development of its magnetic tape and the 
answering machine for six decades to pro- 
tect the telephone business of its corporate 
parent, AT&T, who feared that the recording 
of conversations would deter people from 
using telephones. The arrival of fibre optics, 
mobile phones, faxes and speakerphones 
were similarly delayed. 


| nnovation and business interests do not 
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In his groundbreaking book, The Master 
Switch, Columbia University law professor 
Tim Wu weaves together these and other 
examples to examine how disruptive tech- 
nologies enter and develop within society. 
The new industries that emerge, he argues, 
progress in a cycle: companies grow to 
become empires, which close the field until 
the next wave of technology arrives to dis- 
mantle the existing order. 

Wu covers the histories of radio, music, 
film, television and the Internet. All are lit- 
tered with examples of vested interests that 
have thwarted competition and reduced 
innovation through commercial, political, 
legal and regulatory pressures. Drawing on 
their substantial war chests, large companies 
can lobby hard. For example, one telecom- 
munications giant persuaded the state of 
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Texas in 1995 to passa 
law requiring that any 
would-be companies 
must build phone lines 
that reach at least 60% 
of homes and busi- 
nesses, a measure that 
shut out new competi- 
tors. Another tactic is 
to charge exorbitant 


4; The Master 
rental fees for facili- Switch: The 
ties owned by large pice and Fall 
companies. of Information 

Monopolistic pow- Empires 


ers also restrict free- 
dom of expression 
and civil liberty. For 
example, after decid- 
ing arbitrarily that films more than a few 
minutes in length were uninteresting, the 
Edison Trust refused to license longer feature 
films. Studios such as Paramount Pictures, 
Fox and Universal sprang up in rebellion. 
Hollywood grew partly as a result of its prox- 
imity to Mexico, where independent film- 
makers could escape from injunctions and 
subpoenas coming from controlling interests 
on the US East Coast. 

On taking over ownership of the film 
industry, these few studios soon applied their 
own censorship. Bowing to pressure from 
Catholic activists to uphold moral values 
on screen, the film-industry bosses in 1934 
agreed to abide by a production code, known 
as the Hays Code. Named after campaigner 
William Hays, president of the Motion 
Picture Producers and Distributors of 
America, the set of rules specified what was 
considered obscene. For decades it restricted 
what the public could view. 


Knopf: 2010. 384 pp. 
$27.95 


PHONE WARS 
Democracy is also influenced by the 
narrow ownership of telecommunications. 
For example, in the US presidential elec- 
tion of 1876, news monopoly Associated 
Press supplied content for communications 
monopoly Western Union, which leaked 
rivals’ confidential telegrams to its favoured 
candidates. In the past decade, the US 
government has been accused of secret wire- 
tapping, made possible by the concentration 
of telecommunications in a few hands. Thus 
there is constant combat over the ownership 
of communication technologies — between 
open and closed, decentralized and central- 
ized models. 

The latest battlefield is the Internet. Wu 
stresses that, because it is so crucial to our 
society, we must prevent the monopolistic 
cycle that might close this diverse, distrib- 
uted, decentralized and 
democratic system. 
Such attempts to gain 
broad control have so 
far failed — witness 


> NATURE.COM 

Carl Zimmer muses 
onscience and film: 
go.nature.com/geo44i 


the ill-fated merger, now dissolved, between 
media giant Time Warner and Internet 
service provider AOL. But Wu is wary of the 
rise of ‘closed’ platform devices that restrict 
what programs can be used, such as Apple’s 
Mac, iPad and iPhone, compared with open 
systems, such as the earlier Apple II. He 
quotes Tom Conlon writing online in Popular 
Science: “Once we replace the personal com- 
puter with a closed-platform device such as 
the iPad, we replace freedom, choice, and the 
free market with oppression, censorship, and 
monopoly.” 

Central in keeping the Internet open is 
the concept of ‘network neutrality; which 
Wu has popularized: government and infor- 
mation carriers should place no restriction 
on where, when, what and how users access 
information. A requirement of net neutrality 
is that Internet service providers should not 
use price differentiation to fend off upstarts, 
to favour their collaborators, or to retain 
their monopolistic power in new or adjacent 
fields. But opinions are varied and examples 
to the contrary abound: the US cable firm 
Comcast allegedly levied additional fees for 
video traffic from companies that compete 

with its cable busi- 


é ness, for example. 
Should we The British gov- 


intervene 

ernment has also 
to protect th d 
: orneaid recently announce 
mnova d support for a two- 
afreeandopen —gyeed Internet. 
Internet at any Wu believes that 
expense? the antiquated 


competition laws 
that focus on pricing to protect consumers 
are inadequate in the information industry, 
because collusion restricts choices but does 
not always inflate prices. Rather than legis- 
lation, he proposes a ‘separations principle, 
whereby vital components of the informa- 
tion industry are entrusted to different 
institutions, both public and private. These 
bodies would apply checks and balances to 
ensure that control is not given to only a few 
players. Such an idea is attractive, yet will 
undoubtedly be difficult to put into practice 
because of vested interests. 

The Master Switch offers powerful lessons 
from the past for the future of the Internet. 
Should we let it evolve along its natural 
trajectory, and risk it becoming temporarily 
controlled by monopolies until the next 
breakthrough? Or should we intervene to pro- 
tect innovation and a free and open Internet at 
any expense? Perhaps, though, we don't have 
as much control as we think. Wu cites ancient 
Chinese wisdom from Luo Guanzhong: “An 
empire long united, must divide; an empire 
long divided, must unite. Thus it has ever 
been, and thus it will always be? = 


Li Gong is chief executive of Mozilla China. 
e-mail: lgong@mozilla.com 


Books in brief 


= The Genesis of Science: The Story of Greek Imagination 
MN Stephen Bertman PROMETHEUS BookS 304 pp. $27 (2010) 


AN 


The origins of science in ancient Greece are explored by classicist 
Stephen Bertman. He looks beyond the familiar names such 

as Euclid and Pythagoras to lesser-known figures, including the 
mapmaker Anaximander and alchemist Maria the Jewess, popularly 
known for inventing the eponymous bain-marie water bath and 
various pieces of chemical apparatus, including the still. Bertman 
argues that the Greeks owe their scientific success to their belief in 
an ordered Universe, the rules of which could be unpicked by the 
human mind. 


Hunger: The Biology and Politics of Starvation 

John R. Butterly and Jack Shepherd DARTMOUTH COLLEGE PRESS 

360 pp. $29.95 (2010) 

One in seven of the world’s population is short of food. Lack of 
political will is the main reason for not addressing hunger, explain 
medical scientist John Butterly and environmental scientist Jack 
Shepherd. As well as describing the biology of human nutrition and 
famine, they examine the political and historical factors that cause 
hunger and malnutrition to remain major health problems today 
despite advances in science and technology and the proliferation of 
humanitarian efforts. 


Longevity and the Good Life 

Anthony Farrant PALGRAVE MACMILLAN 256 pp. $85 (2010) 

Living longer may not be such a good thing, cautions bioethicist 
Anthony Farrant. Although breakthroughs in medical biotechnology 
have the potential to extend our lives and make them healthier, he 
disputes the idea that immortality is desirable and cautions that 
the ready availability of such enhancements will diminish the value 
we put on reaching old age. Increasing longevity will challenge the 
fair distribution of resources, especially health care. Ultimately, he 
says, these pressures will undermine the idea that all people are 
fundamentally equal, and thus threaten the good life. 


Man and Woman: An Inside Story 
Donald W. Pfaff OXFORD UNIVERSITY PRESS 232 pp. £15.99 (2010) 
Gender differences have deep and tangled roots, according to 
we neuroscientist Donald Pfaff. Although genetic and biological factors 
such as neuroanatomy contribute to this dichotomy, he argues, 
seman | they do not dominate. Cultural influences, including experiences of 
stress throughout various stages of our lives, may be just as large 
and affect males and females in varied ways. Differences between 
; the sexes, both physical and mental, result from a combination of 
4 genetics and environment that operates on many levels to influence 
behavioural mechanisms. 


Lessons Learned: Reflections of a University President 

William G. Bowen PRINCETON UNIVERSITY PRESS 168 pp. $24.95 (2010) 
William Bowen reflects on the lessons he learned while he was 
president of Princeton University in New Jersey from 1972 to 1988, 
and president of the Andrew W. Mellon Foundation in New York from 
1988 to 2006. He shares advice on fund-raising, hiring, managing 
faculty members and interacting with trustees. And he reveals his 
experience of shepherding the elite university through the civil-rights 
movement and the Vietnam War, a period during which he helped to 
expand the faculty, especially in the life sciences. 
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Eels are revered as gods by some cultures, including that of the Maori — as depicted in this wall mural in Canterbury, New Zealand. 


| ECOLOGY | 


The mystery of eels 


Kim Aarestrup is reminded of how little we know about these endangered fish. 


rious creatures. They spawn in remote 
and nutrient-poor places in the seas, and 
no human has ever seen one reproduce in the 
wild. Their rice-sized hatchlings embark on 
an odyssey of up to 6,000 kilometres to find 
fresh or brackish water, where they grow for 
decades — reaching weights of more than 
20 kilograms — only to return to the sea, where 
they spawn, die and sink into the abyss. 
Exploited as food for millennia owing to 
their abundance, taste and high energy con- 
tent, eels cannot yet be cultured profitably. 
All traded eels are wild — and populations 
are plummeting. Species in temperate areas, 
including the American, Japanese and Euro- 
pean eel, have become scarce, with popula- 
tions dropping by more than 90% in the past 
four decades. European eels are now listed 
as critically endangered by the International 
Union for Conservation of Nature, a shock- 
ing development for a fish once found across 
all the accessible waters of Europe. 
In Eels, naturalist James Prosek travels 


Snse and nocturnal, eels are myste- 
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and interviews leading scientists worldwide 
to examine the Anguilla genus. By broaden- 
ing the perspective beyond Atlantic species, 
his book complements Tom Fort’s marvel- 
lous The Book of Eels (HarperCollins, 2002). 
As well as describing the biology of the eel, 
Prosek considers the 
cultural and economic 
value we attach to it, 
interweaving historic 
vignettes from Aristo- 
tle’s interest in the ori- 
gins of European eels 
to Sigmund Freud’s 
nineteenth-century 
search for their testes. 

Prosek visits New 
Zealand, where the 


Eels: An 
Exploration, from 


New Zealand to Maori revere the large 
the Sargasso, of endemic longfin eel 
Me Words MO Anenilin dieffenbachii 
Mysterious Fish g' oe 

JAMES PROSEK as a religious symbol, 


which they believe 
can bark like a dog 


HarperCollins: 2010. 
304 pp. $25.99 
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and scream like a baby. Large road projects 
have been diverted to avoid areas populated 
by taniwha, or special guardian eels. Prosek 
goes to Japan, a nation that eats vast quan- 
tities of eel, making the traditional dish 
kabayaki a multimillion-dollar industry. He 
also visits the Micronesian island of Pohn- 
pei, where Anguilla marmorata is sacred, 
believed to be the islanders’ ancestor. 

Restoration of eel populations will be diffi- 
cult. Prosek lists contributors to their decline: 
loss of habitat, dams, fishing, introduction of 
parasites, pollutants and changes in ocean 
currents. These factors, and our lack of knowl- 
edge about key stages of the eel life cycle, make 
population management problematic. 

The plight of temperate species has led to 
a surge of eel research in the past few years. 
Recent papers have described captures of 
Japanese eels that have spawned, showing that 
they do so in tropical ocean frontal zones, a 
mixing zone between warm and cold oceanic 
waters. Other research has revealed the diet of 
newly hatched eel larvae (called leptocephali) 


and suggested alternative larval migration 
routes for European eels other than the North 
Atlantic drift current. Prosek’s book stops 
short of capturing these emerging results. 

For American and European eels, monitor- 
ing the late stages of their life cycle in the Atlan- 
tic is the greatest challenge. Only by assessing 
survival rates can we focus remedial action on 
the most important life stages. The difficulty 
of tracking small animals over vast distances 
is immense — attached telemetry devices that 
measure and transmit data are currently the 
only feasible method of following adult eels 
across the ocean. The miniaturization of trans- 
mitters in the coming years should advance 
knowledge considerably. More information on 
tropical eel species is also needed, as we know 
even less about them — a new species was 
even discovered recently in the Philippines 
(Anguilla luzonensis) — and different factors 
will affect their survival. 

Eelsis a solid introduction to global Anguilla 
species. It provides a convincing argument 
that eels should be preserved because of their 
unique life cycle, and their economic and 
cultural importance. To restore and manage 
eel populations worldwide, we need a deeper 
understanding of their life history. = 


Kim Aarestrup is a senior scientist in the 
National Institute of Aquatic Resources at 
the Technical University of Denmark, 
8600 Silkeborg, Denmark. 

e-mail: kaa@aqua.dtu.dk 
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Biodiversity as a 
bonus prize 


Rare species and ecosystem services make uneasy 
bedfellows, discovers Emma Marris. 


s biologist Ken Thompson explains 
Ae Do We Need Pandas?, conserving 

rare species does not really benefit 
people. If you care about nature because 
of its usefulness to humanity, pandas are a 
luxury item — and so are most other rare 
species. The money that is spent on saving 
them could be better applied by protect- 
ing ecosystems that provide us with food, 
timber, clean water, a liveable climate and 
flood protection. 

If one’s aim is to prevent extinctions, as 
in much of traditional conservation, then 
identifying and fussing over endangered 
species is the best way forward. If one sees 
the environment asa source of services, as 
Thompson does, the more sensible course 
is to “conserve the fabric of whole ecosys- 
tems, and let the rare species look after 
themselves”. 

The reason, he explains, is partly because 
rare species are too sparse to significantly 
influence the functioning of an ecosys- 
tem. They are thus 


unlikely tobeessen- «We choyld 
tial for the continued focus on 
provision of ecosys- saving whole 
tem services. 

Thompson traces ibe oie I 
conservation scien- that are usef " 


a »” 
tists failed attempts for humanity. 


to prove that biodi- 
versity is inherently good for ecosystems. 
First, the results of these experiments 
— typically using small plots containing 
manipulated numbers of plants — were 
not what they were cracked up to be. Yes, 
more-diverse ecosystems were more pro- 
ductive on average. But this was not a result 
of their variety alone — it was because they 
were also more likely to include the most 
productive plant, monocultures of which 
could be even more productive. Second, 
experimenters defined productivity in 
terms of turning sunlight into biomass. 
Yet growth need not 


> NATURE.COM tally with value: in 
For areview on lakes, high productiv- 
protecting the ity often means more 
panda, see: algae and fewer fish. 

go.nature.com/3dk6od Thompson proposes 


that we give up the = 
goal of maximizing 
biodiversity. Instead, 


we should focus on jleed 
saving whole eco- hd 
systems that are use- = as? 
ful for humanity. In se, | 
the process of con- eng 
serving such areas, y 
biodiversity will be 
Do We Need 
protected anyway, Pandas? The 
as a sort of bonus Uncomfortable 
prize. Truth About 
But by putting the Biodiversity 
focus only on what — KEN THOMPSON 
nature can do for us, Geen Books: 2010. 
160 pp. £9.95 


Thompson leaves 
open the possibil- 
ity that ecosystems that do not deliver 
sufficient services might be thrown out, 
with all the biodiversity that they contain. 
He admits that society has benefited from 
the turning over of forests and wetlands 
to agriculture: “It is only because of such 
conversion that you and I have enough to 
eat.’ But he does not support conversion of 
any of the remaining wild habitat. Others 
disagree: some economists might argue 
that a particular wild patch would provide 
better services to humanity as pasture or 
plantation. This is the peril of the ecosys- 
tem-services model. Hitch your wagon to 
it, and when conversion provides better 
services than protection, your biodiversity 
bonus is cancelled. 

Despite his book’s provocative title, 
Thompson does not claim that we don’t 
need pandas. Like most ecosystem serv- 
ices enthusiasts, he is keen to have his 
economic pragmatism and his emotional 
love of nature too. Letting the panda go 
extinct would be “a profound failure for 
our stewardship of the natural world”, he 
feels. But he cannot have it both ways. 
If the ecosystems in which pandas live 
do not provide economically valuable 
services to humanity, then it is goodbye 
panda. = 


Emma Marris writes for Nature from 
Columbia, Missouri. 
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Some illegal drugs such as heroin have been demonized only since the mid-nineteenth century. 


Opiates for the people 


W. F. Bynum applauds an open-minded exhibition on 
the history of recreational drugs. 


r | Vhe exhibition High Society may not 
alter your mind, but it will broaden 
your thinking. The ubiquity of 

recreational drug use in all times and all 
cultures is the focus of this show at London's 
Wellcome Collection. On offer is a varied 
combination of artefacts, artworks, books 
and videos tackling pharmacology, the 
drug trade, self-experimentation, collec- 
tive intoxication and the ethics of abusing 
these substances. 

Opium and its products pop up again 
and again: the largest object in the exhi- 
bition is a gigantic opium pipe from the 
late nineteenth century. The dominance 
of opium is unsurprising, as the poppy 
was cultivated for its sap at least as early as 
3000 Bc. Both opium and alcohol have been 


High Society: intimately connected 
Mind-Altering with humanity for 
Drugs in History millennia. However, 
and Culture f ; 

: cocaine, mescaline, 
Wellcome Collection, 
London. LSD, tobacco and 
Until 27 February even betel nuts also 
2011. get their due. Coffee 


is here, too — the only 
drug available for testing in situ, at the Well- 
come Collection's café. 

One man’s stimulant is another man’s 
poison. This exhibition takes a non-judge- 
mental stance, making no distinction 
between legal thrills — such as those deliv- 
ered by coffee, alcohol and tobacco — and 
illegal ones. In any case, as the historical 
and anthropological arcs of the displays 
show, the legal issues are time- and culture- 
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dependent. Opium was demonized in the 
West only from the mid-nineteenth century; 
marijuana is always contentious; and many 
moral or health crusaders would today ban 
tobacco, alcohol or both. 

Running throughout the exhibition is the 
striking documentation of the fine line sepa- 
rating pleasure and pain, ecstasy and vacant 
dependency. This is most explicit on a wall of 
19 photographs by Tracy Moffat from 1999, 
entitled Laudanum. The images — depicting 
a woman and her maid experiencing hallu- 
cinations in various guises and poses, some 
erotic, some melancholic and some down- 
right disturbing — convey a sense of what it 
would be like to witness such an event. 

Context is important in experiencing 
mind-altering drugs. Social cohesion and 
rites of passage are linked to some halluci- 
nogens, as the exhibition's images and films 
reveal. Ayahuasca, a psychoactive brew made 
from Banisteriopsis vines, is shown being 
drunk by the Tucano people of the Colom- 
bian Amazon. Peyote, a cactus containing 
mescaline among a range of alkyloids, is 
dried and chewed by the Huichol and other 
indigenous peoples of Mexico. The resin 
from the bark of the virola tree is snorted 
like snuff during a ceremony in a Venezuela 
palm festival. These colourful rituals stand 
in contrast to less convivial Western gather- 
ings, including a counterculture photograph 
of “4:20 Day; when 10,000 people gathered on 
20 April 2008 for a mass ‘smoke-in’ of canna- 
bis at the University of Colorado, Boulder, to 
protest against the drug’ illegal status. 

Medical science also gets a look in. Swiss 
chemist Albert Hofmann’s description of 
LSD is on display, as well as classic texts 
about clinical uses of opium and its deriva- 
tives. Ina recorded interview, British neuro- 
scientist Barry Everitt describes the latest 
research on neural networks that is helping 
us to understand addiction and prevent 
relapse. He explains how addiction may be 
a learned behaviour that is intimately tied to 
memory retrieval. 

A display on the economics of drugs 
shows some striking figures. The annual 
expenditure by the US government on its 
“War on Drugs’ equals the annual worldwide 
income of the Roman Catholic Church, or 
the amount Americans spend each year on 
complementary medicine. And each of these 
exceeds the yearly international market for 
antidepressants. 

The centre of gravity of this rewarding 
exhibition seems to lie in the ‘swinging 
sixties. If you didn't experience that era first 
hand, as I did, this is your chance to find out 
what all the hype was about. = 


W. E Bynum is emeritus professor of the 
history of medicine at University College 
London, UK. 

e-mail: w.bynum@ucl.ac.uk 


ORRESPONDENCE 


Economic growth: 
indicators not targets 


Peter Victor questions the 
merits of economic growth in 
developed countries (Nature 
468, 370-371; 2010). In such 
discussions, it is important 

to avoid confusing indicators 
with optimization targets. An 
indicator that may be useful for 
evaluating an economy could be 
harmful when used as a target 
to improve the state of the 
economy. 

An economic indicator, such 
as gross domestic product 
(GDP) or the genuine progress 
indicator (GPI), is a number 
that quantifies a particular 
aspect of an economy. Indicators 
are useful for comparing 
different economies or for 
monitoring development. But 
they are overly simplistic in that 
they ignore all non-quantifiable 
aspects of living. 

This flaw becomes crucial 
when an indicator is turned 
into an optimization target. 
Politicians will quickly identify 
and exploit mechanisms that are 
likely to increase the indicator, 
even if there is no benefit for 
society. 

Measures that would not even 
be considered in the absence of 
a specific optimization target 
can then become political 
priorities when that target 
is adopted. Public debt is 
one such indicator that has 
recently become a high-priority 
optimization target in many 
European countries, despite 
wide recognition of the socially 
negative effects of the cuts that 
are needed to reduce it. 

This will happen to any 
indicator, including the GPI, 
which takes into account 
social and environmental 
factors as well as economic 
ones. Economists and 
politicians must accept that 
no single number can safely be 
optimized. Several indicators 
that concentrate on different 


aspects of society need to 

be used in parallel, and any 
measure that improves one 
while decreasing another must 
be recognized as a compromise 
between conflicting goals. 
Konrad Hinsen Centre de 
Biophysique Moléculaire (CNRS), 
France. 
konrad.hinsen@cnrs-orleans.fr 


Economic growth: 
enough is enough 


There is substantial evidence 
that further economic growth 
in wealthy nations is neither 
sustainable nor desirable. It is 
indeed time, as Peter Victor 
writes (Nature 468, 370-371; 
2010), to answer key questions 
about what a non-growing 
economy would look like in 
practice. We need a new macro- 
economics for sustainability, and 
we need it now. 

On 17 November a report was 
released in the United Kingdom, 
entitled Enough is Enough: Ideas 
for a Sustainable Economy ina 
World of Finite Resources (see 
go.nature.com/hv52np). The 
report brings together the ideas 
generated at the first Steady State 
Economy Conference held in 
June this year in Leeds, UK. It 
discusses policy proposals in ten 
key areas needed to achieve a 
no-growth economy. Proposals 
include policies to limit resource 
use, reduce income inequality, 
reform the monetary system, 
change consumer behaviour, 
restructure business, secure 
full employment and improve 
the way in which we measure 
progress. 

A growing number of 
economists, scientists and 
policy-makers are beginning 
to understand the urgent need 
for an economic model based 
on stability instead of growth 
(see go.nature.com/f8ig8s). A 
combination of further research 
into the steady-state model and 
bold action to turn this model 


into government policy is 
required to achieve well-being 
for everyone within ecological 
limits. 

Daniel W. O’Neill Center for the 
Advancement of the Steady State 
Economy, Leeds, UK. 
dan_oneill@steadystate.org 


Coordinate green 
growth 


Green economic growth needs 
a shared sense of direction if it 
is to lead to a more sustainable 
future under climate change. 
Studies on green innovation and 
societal transformation show 
that uncoordinated initiatives 
are unlikely to be an effective 
way “to get the ball rolling and to 
‘learn by doing” (Nature 468, 
477; 2010). 

First, socio-technical 
transformations, such as the 
transition from fossil fuels to 
renewable-energy sources, 
will require several decades 
to complete. Speeding up 
this process needs focus 
and coordination at the 
international level. 

Second, the learning curves 
for creating energy-efficient and 
renewable-energy technologies 
are global. Here, coordination 
will be necessary to determine 
cost reductions and to increase 
performance. 

Third, green growth calls 
for major shifts in the way in 
which economies are organized. 
It is not trivial to align the 
interests of fossil-fuel-intensive 
incumbent industries and their 
supporting power structures 
with the interests of emerging 
‘green’ industries. Again, 
coordination will be necessary 
to overcome the resistance 
to change in incumbent 
production and consumption 
systems. 

Floortje Alkemade, Marko 
Hekkert Utrecht University, the 
Netherlands. 
falkemade@geo.uu.nl 


Aslip in the date of 
DNA’s discovery 


In her review of Anna Ziegler’s 
play Photograph 51, Josie 
Glausiusz refers to DNA’s 
“discovery” in 1953 (Nature 
468, 375; 2010), when this was 
in fact the year its structure was 
solved. The molecule itself was 
discovered almost a century 
earlier. 

It was a young Swiss 
physician, Friedrich Miescher, 
who stumbled on DNA in 
1869, naming it nuclein. He 
realized that it chemically 
defines the nucleus — an 
enigmatic organelle at that 
time — and identified the 
molecule in a wide variety 
of cell types, including germ 
cells. He determined DNAs 
elementary composition and 
basic biochemical properties, 
and suggested that it could be 
important in cell proliferation, 
realizing it was synthesized 
before cell division. 

Miescher developed theories 
on the basis of these findings to 
explain DNAs function in terms 
of fertilization and heredity, even 
proposing how macromolecules 
might encode information. His 
work also stimulated others 
to investigate DNA and its 
function. 

Miescher should therefore 
be remembered not just as the 
discoverer of DNA, but also 
as the founder of molecular 
genetics. 

Ralf Dahm Institute of Molecular 
Biology, Mainz, Germany. 
r.dahm@imb-mainz.de 
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OBITUARY 


Allan Sandage 


(1926-2010) 


Astronomer who measured expansion rate of the Universe. 


Ilan Rex Sandage was one of the most 
Am and influential astronomers 

of the second half of the twentieth 
century. Edwin Hubble and Walter Baade 
both left their scientific papers to him, and 
he continued the work of these two giants 
with spectacular results — including the 
first good estimates of the Hubble constant 
and the age of the Universe. 

These are his best known achievements, 
but they comprised only a small part of his 
publications, which number in excess of 500. 
In 1956, Sandage became a staff member of 
the Mount Wilson and Palomar observa- 
tories in California; his retirement almost 
50 years later did not stop him from working 
until the very end — his last paper is still in 
the press. He died, aged 84, on 13 November 
2010 at his home in San Gabriel, California, 
from pancreatic cancer. He is survived by his 
caring astronomer wife Mary Sandage and 
his two sons David and John. 

Sandage was born on 18 June 1926 in Iowa 
City, Iowa, as the only child of a powerful 
father — a professor of advertising — and 
loving mother. He received his BA in 1948 
from the University of Illinois at Urbana- 
Champaign and then moved to the California 
Institute of Technology in Pasadena, Califor- 
nia, for his PhD. Here he became Hubble's 
assistant in 1952, a year before completing 
his degree and Hubble’s sudden death. He 
chose Baade as his thesis adviser, and Baade 
went on to teach him all the intricacies of the 
observing techniques of the time. This made 
Sandage, who spent roughly 2,000 nights at 
the telescope during his lifetime, an outstand- 
ing observer. He published the famous book 
The Hubble Atlas of Galaxies, followed by a 
further two colossal atlases containing some 
of the best ground-based pictures of galaxies 
ever taken. 


EXPANDING THE UNIVERSE 

From the very beginning of his career, 
Sandage was a star. His 1953 PhD thesis 
reversed the thinking of the day that faint 
‘main sequence stars started their lives as red 
giants; his measurements of the M3 globular 
cluster led to the conclusion that the oppo- 
site order was correct. This was a revolution 
in the understanding of stellar evolution. 
Sandage continued to work on determining 
the distances and ages of clusters, as well as 
on the properties of their variable stars — the 
RR Lyrae and Cepheid stars — all his life. 
He was one of the first to use supernovae 


to measure very large distances, and led a 
Hubble Space Telescope team to calibrate 
their luminosity. 

By 1958, Sandage had massively revised 
Hubble’s estimates of galactic distances, 
increasing them bya factor of about 7. From 
this he determined an expansion rate of the 
Universe, otherwise known as the Hubble 
constant, of about 75 kilometres per second 
per megaparsec, and an age of the Universe 
of around 13 billion years. Today’s best 
estimates are, remarkably, essentially the 
same — albeit with smaller errors. 


Perhaps his seminal work was his 1961 
paper “The ability of the 200-inch telescope 
to discriminate between selected world 
models’, which has become the basis of 
modern observational cosmology. In it, 
Sandage calculated what the past and future 
would look like under different models of 
an expanding Universe and predicted the 
consequences for an observer. 

On the back of these predictions, Sandage 
single-handedly mounted a giant programme 
to extend the Hubble diagram, which charts 
redshifts of galaxies (a measure of how fast 
they are moving away from Earth) against 
their relative distances. This became the 
crucial piece of evidence that helped to dis- 
pel doubts about whether very large redshifts 
are really caused by cosmic expansion, or by 
some as-yet-unknown physics. Such doubts, 
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fostered by Hubble himself, became prevail- 
ing when the stunningly large redshifts of 
quasars were discovered in 1963. Sandage’s 
work was instrumental in settling the debate 
in favour of an expanding Universe. 


QUIET DISCOVERIES 

Sandage was actively involved in the 
discovery of quasars — enigmatic objects 
first detected by radio astronomers in 
the late 1950s. Sandage provided some 
of their first optical identifications and 
the first spectrum. He also co-discovered 
radio-quiet quasars, and showed that these 
objects greatly outnumber their radio-loud 
counterparts. In 1963, he co-wrote a paper 
on violent processes in galactic centres that 
anticipated today’s explanation of quasars as 
being very distant galactic nuclei powered 
by black holes. 

Sandage’s most cited paper is from 1962, 
in which he theorized how the pancake- 
shaped Milky Way was formed by the 
collapse of a spherical gas cloud. This work 
remains the basis of modern theories of 
galaxy formation. 

Sandage also authored several essays on 
the history of modern astronomy, including 
a monumental history of the Mount Wilson 
Observatory. 

In mid-career, Allan became deeply con- 
cerned about the meaning of life. He studied 
the Bible and spoke in public about science 
and religion as “two separate closets in the 
same house”. In the end he highly valued 
Christian philosophy, but did not find faith. 
He resolved to work to the limit of exhaustion. 
Some people thought that he was ambi- 
tious, but his drive came from his conviction 
that work was the only meaningful human 
activity. For Allan, life was not about fun. 

Allan sometimes called himself a 
curmudgeon. But in his social life he was 
radiant with charm and wit; his after-dinner 
talks could make his audience explode with 
laughter. He loved books and was a fan of 
opera. His favourite time was sitting at the 
dark telescope, as he did for so many nights 
of his life, taking long exposures while the 
dome reverberated with the music of com- 
poser Richard Wagner. m 
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Figure 1 | A Drosophila larva. The photoreceptors that mediate light avoidance are labelled green, with the posterior epidermal cells of each segment labelled red. 


Feel the light 


How is light perceived? The answer that might immediately come to mind is, through the eyes. Fly larvae, however, can 
‘feel’ light using specialized neurons embedded under the cuticle encasing their bodies. SEE ARTICLE P.921 


PAUL A. GARRITY 


L« perception is a highly useful skill. 


Like other animals, we humans rely on 
vision to navigate, to locate food and 
mates, and to avoid predators. But biological 
applications of light perception go well beyond 
vision — from basic light-avoidance to circa- 
dian rhythms’. What’s more, photoreceptive 
cells are located not only in the eyes, but also 
in various non-ocular locations, ranging from 
the skin in molluscs’ to the hypothalamus 
deep within a bird’s brain’. Even overtly eyeless 
animals, such as the soil-dwelling nematode 
Caenorhabditis elegans, possess photosensitive 
neurons that help them to avoid the daylight’. 
In this issue, Xiang et al.” (page 921) extend 
the analysis of non-ocular photoreception to the 
fruitfly Drosophila melanogaster. They describe 
a set of dermal photoreceptors that, surprisingly, 
had previously escaped notice in this well-stud- 
ied organism, and uncover a molecular mecha- 
nism of phototransduction that has not been 
previously encountered in the fly. 

The lives of Drosophila larvae are highly 
focused on burrowing. As they increase in size 
in preparation for adulthood, the larvae must 
feed ravenously, and immersing themselves in 
the nutritious goo of rotting fruit is an excellent 
way to access a surfeit of calories in a hurry. In 
addition, the larvae are highly vulnerable to 
predation when exposed, because their squishy 
bodies crawl along rather slowly. Burrowing 
helps them keep out of harm’s way. Tunnelling 
therefore provides a singular solution for the 
larva's need for both feeding and defence. 

One of the cues that fly larvae use to orient 
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the crucial drive towards the interior is light. 
Young fly larvae are highly photophobic, and 
this behaviour involves a pair of primitive 
eye-like structures inside the larva, near its 
anterior’. These structures, called Bolwig 
organs, resemble the compound eye of the 
adult fly in many respects, including their 
expression of light-sensing rhodopsin 
pigments’. But whereas Bolwig organs can 
lead larvae out of the light, their anterior loca- 
tion raises a potential problem: once the larval 
anterior is submerged, the light-driven force 
for burrowing should diminish. This could 
leave the larva in the awkward position of 
posterior exposure, like the proverbial ostrich 
with its head in the sand. 

Xiang et al.” elegantly attack this ethological 
conundrum. By genetically ablating the Bolwig 
organs, the authors show that, although 
Bolwig neurons are crucial for avoiding low 
light intensities, the requirement for these cells 
wanes as light intensities approach those of 
direct sunlight — around 1 mW per mm’. 

This observation suggests that flies contain 
additional photoreceptors. Suspecting that 
these photoreceptors could be analogous to 
dermal photoreceptors described in other 
creatures’”, the authors systematically scanned 
the sensory neurons along the larval body wall 
for physiological responses to light. They note 
that one particular set of sensory neurons — 
the class-IV da neurons — is strongly activated 
by light (Fig. 1). Satisfyingly, these neurons 
remain light responsive even when grown in 
isolation in culture, confirming their intrinsic 
light sensitivity. Flies therefore contain dermal 
photoreceptors. 
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Do these dermal photosensors mediate 
avoidance of high-intensity light? Xiang and 
colleagues’ genetic-ablation experiments indi- 
cate that they do. Killing class-IV a neurons 
significantly reduced avoidance at all light 
intensities. Somewhat surprisingly, how- 
ever, killing just these neurons, but leaving 
Bolwig organs intact, dramatically decreased 
responses to high-intensity light. This sug- 
gests that, rather than having overlapping, 
redundant functions, these two classes of 
photosensors drive behaviour over different 
ranges of light intensity. 

Intriguingly, previous studies'’” have 
shown that class-IV da neurons also partici- 
pate in aversive responses to noxious heat and 
mechanical force. Together with the present 
results, it seems that these neurons serve as 
multi-purpose triggers of avoidance. On acti- 
vation by light, they may provide that extra jolt 
the larva needs to ensure that its entire body is 
fully protected. 

More surprises were in store when Xiang 
et al. probed the phototransduction machin- 
ery of class-IV da neurons. Activation of 
these cells by light was unaffected when the 
researchers eliminated proteins on which other 
fly photoreceptors depend, such as the photon- 
detecting rhodopsins. Instead, it depended on 
another G-protein-coupled receptor, Gr28b. 

Initially classified as a gustatory receptor, 
LITE-1 — a nematode relative of Gr28b — 
was recently discovered*"* to mediate pho- 
totransduction in C. elegans. But although 
Gr28b and LITE-1 are related, initial evidence 
suggests differences in the phototransduction 
pathways in which they are involved. LITE-1 


CHUN HAN, UCSF 


acts through the cyclic nucleotide cGMP to 
activate cyclic-nucleotide-gated ion channels’. 
Xiang and colleagues’ pharmacological data 
suggest, however, that these channels might 
not be required for Gr28b activity. Instead, 
phototransduction in the class-IV da neurons 
relies on a member of the TRP family of cation 
channels, TRPA1. 

Drosophila TRPA] is known to act as a molec- 
ular sensor of temperature'**° and of reactive 
electrophiles”’, such as the wasabi ingredient 
allyl isothiocyanate. TRPA1 is also distantly 
related to the TRP channels that act down- 
stream of rhodopsins in the fly, although this 
protein was not previously implicated in photo- 
detection. Precisely how TRPA1 cooperates 
with Gr28b to mediate phototransduction 
remains to be determined, but activation 
by G-protein signalling seems a reasonable 
possibility. 

A key issue this paper” raises is the mech- 
anism(s) by which proteins such as LITE-1 
and Gr28b participate in phototransduction. 
When misexpressed, LITE-1 can make cells 
photosensitive*’, suggesting that it could 
participate in photon detection. Whether 
Gr28b shares this capability is not known, but 
it raises the question of how photons might 
interact with these molecules, and whether the 
mechanisms used by rhodopsins might have 
some relevance here. As Gr28b and LITE-1 
have additional relatives in flies, worms and 
other invertebrates, related pathways may be 
deployed elsewhere in these animals. From a 
broader evolutionary perspective, one wonders 
about the origins of these light sensors and the 
extent to which their functional analogues may 
occur in other present-day organisms, but have 
simply escaped our notice — as was the case for 
so long in Drosophila. = 
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A strange ménage a trois 


The two Magellanic Clouds may have joined our Milky Way quite recently. It turns 
out that this trio of galaxies is remarkably unlike most other galaxy systems — 
both in the luminosity of the clouds and in their proximity to the Milky Way. 


SIDNEY VAN DEN BERGH 


e are all Copernicans now. So 
we expect to be living in a typical 
galaxy in anormal neighbourhood. 


The first of these expectations is fulfilled: our 
Milky Way is a relatively normal giant galaxy 
with fairly loosely wound spiral arms (Hubble 
type Sbc), or perhaps a spiral giant with a cen- 
tral bar-shaped region of stars (SBbc). But the 
second expectation is not fulfilled: the Galactic 
neighbourhood is unusual and quite different 
from what might have been expected. True, 
the Local Group that we belong to is a small 
cluster, like many others in nearby regions of 
the Universe. However, the nearest neighbours 
to our home Galaxy have been observed to 
exhibit remarkable peculiarities. Two papers, 
one in Monthly Notices of the Royal Astronomi- 
cal Society' and the other a recent preprint’, 
now reinforce these observations. 

For most galaxies, including Andromeda’, 
the nearest neighbours are elliptical galaxies or 
lenticulars (an intermediate type between an 
elliptical and a spiral galaxy), whereas the more 
distant companions are spirals with loosely 


Figure 1 | The Large Magellanic Cloud. Calculations by James and Ivory’ and by Liu et al.’ suggest 


bound spiral arms or galaxies with an irregular 
shape. However, the Milky Way's two closest 
big companions, the Large Magellanic Cloud 
(LMC; Fig. 1) and the Small Magellanic Cloud 
(SMC), are irregular galaxies. This anomaly 
suggests’ that the Magellanic Clouds might 
not always have been close satellites of the 
Galaxy, but instead that they might be objects 
formed in the outer reaches of the Local Group 
and that just happen to be passing close to the 
Milky Way at present. Recent calculations® 
suggest that there is a probability of about 72% 
that the Magellanic Clouds were accreted onto 
the Milky Way within the past billion years, 
and a roughly 50% probability that they were 
accreted together. 

The second anomaly among the closest large 
companions to our Galaxy is that the LMC is 
extraordinarily luminous for a Magellanic- 
like irregular galaxy. In nearby regions of the 
Universe, there are only two Magellanic-like 
irregular galaxies (NGC 4214 and NGC 4449) 
that even come close to rivalling the LMC in 
luminosity. In other words, the LMC seems 
to be close to the upper luminosity limit for 
Magellanic-like irregular galaxies. This is 


that the a priori probability of the Milky Way having a nearby satellite galaxy as luminous as the 


Large Magellanic Cloud is very low. 
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important, because there is a fundamental 
morphological difference between spirals 
and Magellanic-like irregular galaxies: spirals, 
which have a large range of luminosities, all 
have nuclei, whereas Magellanic irregulars, 
which are mainly quite faint, do not. It should 
be emphasized that this upper luminosity limit 
applies only to Magellanic irregulars and not 
to the peculiar, chaotic irregular galaxies that 
might have been formed during the collisions 
or mergers of massive ancestral galaxies. 

In 1969, Erik Holmberg® searched for the 
satellites of nearby galaxies on the photo- 
graphic prints of the Palomar Sky Survey. 
Surprisingly, he found that bright satellite 
galaxies like the Magellanic Clouds are quite 
rare. This conclusion is now strengthened and 
confirmed by the work of James and Ivory’ 
and that of Liu and colleagues”. James and 
Ivory used narrow-spectral-band imaging 
of 143 luminous spiral galaxies comparable 
to the Milky Way to search for star-forming 
companions. They concluded that luminous, 
star-forming satellite galaxies resembling the 
Magellanic Clouds are quite uncommon, and 
that our home Galaxy is unusual, both for 
the luminosity and the proximity of its two 
brightest satellites (the Magellanic Clouds). 

A different approach was employed by Liu 
et al.”, who used the enormous database pro- 
vided by the Sloan Digital Sky Survey to search 
for satellite galaxies, around Milky-Way-like 
host galaxies, that have luminosities similar to 
those of the Magellanic Clouds and that are 
located within a distance of 150 kiloparsecs of 
their apparent host galaxy; the LMC and the 
SMC are only 50 and 60 kiloparsecs, respec- 
tively, away from the Milky Way. For 22,581 
Milky-Way-like hosts, Liu et al. found that 
81% have no satellites as bright as the Magel- 
lanic Clouds, 11% have one such satellite, and 
only 3.5% host two such galaxies. As Edwin 
Hubble’ said many years ago, “The fact that 
the [G]alactic system is a member of a group 
is a very fortunate accident.” That the Galaxy 
should have an irregular companion as lumi- 
nous as the Large Magellanic Cloud is almost 
a miracle. m 
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DRUG DISCOVERY 


How melanomas 
bypass new therapy 


The promise of an exciting new drug that inhibits the mutant B-RAF protein in 
skin cancer is marred by the fact that most patients relapse within a year. Fresh 
data hint at how such resistance emerges. SEE LETTERS P.968 & P.973 


DAVID SOLIT & CHARLES L. SAWYERS 


ctivating mutations in the kinase 
Ace B-RAE, a regulator of cell pro- 
liferation and survival, occur in 60% of 
patients with melanoma, an often fatal form of 
skin cancer’. This discovery — made through 
large-scale sequencing of cancer genomes — 
provided the rationale for the development of 
PLX4032, a drug that inhibits B-RAF and has 
shown remarkable clinical activity in patients 
with B-RAF-mutant melanomas”’. PLX4032 is 
under study in a phase III clinical trial, which 
could result in its approval within a year. But 
this clinical success is only transient: resist- 
ance to PLX4032 develops quickly, typically 
within 8-12 months following treatment, even 
in patients whose tumours seem to have dis- 
appeared on radiographic scans. Two papers” 
in this issue provide the first clues as to why. 
Resistance to cancer drugs — like that to 
antibiotics — is an unfortunate, yet familiar, 
story. The anticancer drugs erlotinib, crizotinib 


a Short road 


Cell membrane 


and imatinib, which also function by inhibit- 
ing kinases, are effective treatments for lung 
cancer, leukaemia and gastrointestinal stromal 
tumours. Nonetheless, the long-term efficacy 
of all these compounds is also limited by drug 
resistance. 

Understanding the resistance mecha- 
nism can provide clues both for developing 
improved versions of a drug and for guiding 
the selection of appropriate drug combina- 
tions for therapy. For instance, drug-resistant 
tumour cells often contain secondary muta- 
tions in the target kinase that prevent the drug 
from binding, while retaining the kinase’s 
full oncogenic activity. The discovery of this 
mechanism of resistance in chronic myeloid 
leukaemia accelerated the development of 
two next-generation inhibitors (dasatinib and 
nilotinib), which are likely soon to become 
front-line therapies®”. 

Given this precedent, one would expect sec- 
ondary mutations in B-RAF to be the primary 
cause of resistance to PLX4032. Shockingly, 


b Long road 


i Alternative 
signalling 
pathways 


v 


>»  Proliferation/survival 


Figure 1 | The short and long roads to PLX4032 resistance*”. a, In cells expressing mutant B-RAF, 
overexpression of RAFI, or activation of RAS due to N-RAS mutation, results in the formation of 
B-RAF-RAF1 heterodimers and/or RAF1-RAF1 homodimers, causing resistance to PLX4032. 
Alternatively, overexpression of COT results in RAF-independent activation of MEK and ERK and thus 
resistance to PLX4032. In such cells, therefore, PLX4032 resistance is mediated by reactivation of the 
MAPK/ERK signalling pathway. b, Another possibility is that activation of upstream receptor tyrosine 
kinases (RTKs) such as PDGFR6 makes MEK activity redundant by triggering downstream effectors of 
cell transformation through parallel signalling pathways. 
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however, Nazarian et al.° (page 973) show 
that this is not the case. The authors used 
next-generation sequencing technology to 
exhaustively examine the genomes of tumour 
samples from 12 patients with acquired resist- 
ance to PLX4032, but found no secondary 
B-RAF mutations. This observation is even 
more surprising because, in experimental 
models®, mutations engineered into the ‘gate- 
keeper’ threonine residue in the ATP-binding 
pocket of mutant B-RAF are known to confer 
PLX4032 resistance. 

But aficionados appreciate that oncogenic 
versions (alleles) of the B-RAF gene already 
violate other sacrosanct kinase rules. Like 
most other kinases, normal B-RAF forms 
B-RAF-B-RAF homodimers or heterodimers 
(often with the related protein RAF1) in 
response to upstream signals such as activation 
of the signalling molecule RAS. Oncogenic 
B-RAB, by contrast, signals as a monomer, 
activating downstream signals in the absence of 
upstream input. 

PLX4032 selectively shuts down the activ- 
ity of the mutant B-RAF monomers, potently 
blocking the growth of tumours that have 
B-RAF mutations. Paradoxically, this drug 
also activates downstream signalling in 
cells lacking B-RAF mutations — both non- 
cancerous cells and tumour cells with normal 
B-RAF — through transactivation of the non- 
drug-bound partner in B-RAF-RAF1 hetero- 
dimers and RAF1I-RAF1 homodimers” "'. 
This explains both the exquisite specificity 
of the drug for B-RAF-mutant tumours and, 
most probably, the complication of low-grade 
squamous cell carcinomas that can arise from 
normal skin cells in some patients being treated 
with PLX4032. 

The new papers” suggest at least three dif- 
ferent mechanisms of resistance, all of which 
share the common theme of ‘oncogene bypass. 
This concept was first invoked by the discov- 
ery’ that amplification of the tyrosine kinase 
MET bypasses the therapeutic effect of erlo- 
tinib on the oncogenic protein EGFR in lung 
cancer, thereby causing erlotinib resistance. 

Following the EGFR/MET precedent, Johan- 
nessen and colleagues* (page 968) methodi- 
cally examined kinase-coding genes to see if 
any conferred resistance to PLX4032. They 
obtained two compelling hits: RAF1, which 
was expected from earlier work” in vitro; and 
COT/TPL2, which, like B-RAF and RAF1, 
functions upstream of the kinase MEK (B-RAF 
functions through the MAPK/ERK signalling 
pathway). In both cases, resistance seems to be 
due to restored MEK activation in the face of 
PLX4032 treatment — that is, B-RAF bypass 
(Fig. 1a). The clinical importance of these hits 
remains to be defined, because the authors 
isolated both genes under artificial screening 
conditions. COT, however, is a particularly 
intriguing candidate, because it is amplified 
in a few cell lines with B-RAF mutations that 
show intrinsic PLX4032 resistance’. 


Nazarian and co-workers’ took a different 
approach. They focused entirely on changes 
in signalling pathways in 12 matched pairs 
of drug-sensitive and drug-resistant tumour 
samples obtained from patients with B-RAF- 
mutant melanoma and who were treated with 
PLX4032. In one patient, two independent, 
drug-resistant subclones of the original tumour 
acquired new activating mutations in N-RAS, 
while retaining the original mutant B-RAF 
allele. As is the case for overexpression of COT 
and RAFI, expression of the mutant N-RAS 
allele restored activation of MEK, conferring 
PLX4032 resistance (B-RAF bypass). 

These data suggest that any perturbation that 
restores MEK activation (RAS mutation, RAF1 
or COT overexpression and perhaps other per- 
turbations) has the potential to cause resistance 
to PLX4032. However, the data obtained from 
other patients Nazarian et al. examined suggest 
a different mechanism. 

In samples from five patients in relapse, the 
researchers’ documented increased activa- 
tion of the receptor tyrosine kinase PDGFRB, 
compared with the baseline. Like increased 
expression of COT and mutant N-RAS, over- 
expression of PDGFR6 was sufficient to confer 
resistance to PLX4032, albeit independently of 
MEK activation (Fig. 1b). In the remaining six 
of the twelve patients the authors studied, the 
resistance mechanism remains unexplained. 

Will the present findings** have a clini- 
cal impact? It is too early to tell, given the 
limited number of patients examined and 
the heterogeneity of potential resistance 
mechanisms. But the fact that restored MEK 
activation is sufficient to confer resistance in 
models of B-RAF-mutant melanoma begs the 
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question of whether, at least in some patients, 
resistance to PLX4032 can be overcome using 
MEK inhibitors. 

Indeed, if MEK activation is commonly 
found at relapse, regardless of the mecha- 
nism, one could make a strong argument for 
combined therapy using PLX4032 and a MEK 
inhibitor to either prevent or delay resist- 
ance. The added bonus of this combination 
is prevention of MEK activation in normal 
tissues — a side effect of PLX4032 — which 
could overcome the complication of low-grade 
squamous-cell carcinomas emerging in some 
patients. = 


David Solit and Charles L. Sawyers are 

in the Human Oncology and Pathogenesis 
Program, Memorial Sloan Kettering Cancer 
Center, New York, New York 10065, USA. 
e-mail: sawyersc@mskcc.org 


1. Davies, H. et al. Nature 417, 949-954 (2002). 

2. Bollag, G. et al. Nature 467, 596-599 (2010). 

3. Flaherty, K. T. et al. N. Engl. J. Med. 363, 809-819 
(2010). 

4. Johannessen, C. M. et al. Nature 468, 968-972 
(2010). 

5. Nazarian, R. et al. Nature 468, 973-977 (2010). 

6. Kantarjian, H. et al. N. Engl. J. Med. 362, 2260-2270 
(2010). 

7. Saglio, G. et al. N. Engl. J. Med. 362, 2251-2259 
(2010). 

8. Whittaker, S. et al. Sci. Trans/. Med. 2, 35ra4.1 
(2010). 

9. Poulikakos, P. |., Zhang, C., Bollag, G., Shokat, K. M. 
& Rosen, N. Nature 464, 427-430 (2010). 

10.Hatzivassiliou, G. et al. Nature 464, 431-435 
(2010). 

11.Heidorn, S. J. et al. Cell 140, 209-221 (2010). 

12.Engelman, J.A. et al. Science 316, 1039-1043 
(2007). 

13.Montagut, C. et al. Cancer Res. 68, 4853-4861 
(2008). 


Recipe for making 
Saturn’s rings 


Simulations show that the still-mysterious origin of Saturn’s vast, icy rings could 
be explained by the ‘peeling’ by Saturn’s tides of the icy mantle of a large satellite 
migrating towards the planet. SEE LETTER P.943 


AURELIEN CRIDA & SEBASTIEN CHARNOZ 


ince Christiaan Huygens realized in 1655 

that the planet Saturn is encircled by a 

ring, this “jewel of the Solar System” has 
defied researchers’ best efforts to explain its 
origin. Saturn’s rings are made of centimetre- 
to metre-sized boulders of almost pure water 
ice’ — a unique characteristic among Solar 
System bodies (comets and planetary satel- 
lites contain about 50% silicates and metals). 
The total mass of Saturn’s rings is thought to 


be equivalent to that of a satellite about 500 
kilometres across~*. But how and when the 
rings formed, and why they are so clean of sili- 
cates, is not understood. Several mechanisms 
for their formation have been proposed**, but 
none has provided a convincing explanation 
for their observed peculiarities. On page 943 of 
this issue, Canup’ offers an attractive solution 
to the problem that answers several questions 
at once*. 


*This article and the paper under discussion’ were 
published online on 12 December 2010. 
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a Roche limit 


Gaseous 
circumplanetary disk 


Icy mantle 
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Figure 1 | Canup’s model for the formation of Saturn’s icy rings’. a, A differentiated satellite in the 
gaseous circumplanetary disk around Saturn migrates towards the planet. When the satellite crosses the 
planet’s Roche limit, its icy mantle starts to be pulled into pieces by the planet's tidal forces. b, The silicate 
core carries on migrating towards Saturn and eventually falls into it, leaving behind the ice boulders that 


give birth to the rings and icy satellites of the planet. 


Saturn's rings are in a location that is domi- 
nated by tides. Generally, boulders in circu- 
lar orbits, such as those making up the rings 
of Saturn, merge and grow to form larger 
bodies because of their own gravity — this 
process is thought to be the way in which 
planets and asteroids form around the Sun 
and satellites form around giant planets. 
But because Saturn's rings are so close to the 
planet (below the planet’s Roche limit, which 
is 2.5 times the planetary radius), the tidal 
forces that the planet exerts on them prevent 
the boulders from accreting material. Just as 
the Moon's tides stretch the oceans on Earth, 
Saturn's tides stretch any boulder aggregates 
in the rings, and separate their constituents. 
Similarly, ifa pre-existing icy body were to be 
placed within Saturn’s Roche limit, it would 
be destroyed by the planet's tides. 

These considerations suggest a simple recipe 
for making Saturn’s rings: take a large body 
and put it into a close orbit around the planet; 
tides should then destroy it, and the resulting 
fragments should form the rings. But what 
sort of body? A large, differentiated satel- 
lite — with a core composed of silicates and 
iron, anda lighter mantle made of water ice — 
would bea good choice. The rings could then 


be formed simply by ‘peeling off’ the satellite's 
icy mantle from the core using the planet’s 
tides as a knife*®. Although appealing, how- 
ever, this recipe has never been investigated 
owing to computational limitations. 

Enter Canup’, who describes the details ofa 
numerical and analytical model for tidal split- 
ting of a differentiated satellite around Saturn. 
She demonstrates that the planet’s tidal forces 
can indeed be strong enough to take water 
ice away from the satellite, but not to tear its 
dense silicate core. But how can the differenti- 
ated satellite be brought so close to the planet, 
within the Roche limit? When could this have 
occurred? And what mechanism can get rid 
of the silicate core that has not been broken by 
the tides? Canup solves these problems using a 
single phenomenon: planetary migration. 

The planets of our Solar System formed 
4.5 billion years ago in a disk of gas and dust 
that surrounded the young Sun for a few mil- 
lion years. At the same time, the satellites of 
the giant planets formed in gaseous and dusty 
disks around their hosts. But when a body 
orbits inside a gaseous disk, its orbit shrinks 
and the body spirals gradually towards the 
planet asa result of the gravitational interaction 
of the body with the surrounding gas. In the 
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circum-Saturn disk, satellites grew and fell into 
the planet in this way, until the last generation 
formed and escaped inward migration because 
the disk had vanished*. Canup suggests that the 
last migrating differentiated satellite, about the 
size of Titan (Saturn’s largest moon), had its 
icy mantle pulled to pieces by Saturn's tides as 
it crossed the Roche limit. Beyond this limit, 
the satellite’s silicate core carried on migrating 
inwards and eventually disappeared into Sat- 
urn, leaving behind the ice boulders that make 
up the rings (Fig. 1). This process would have 
produced icy rings about 1,000 times more 
massive than the rings are today. 

This elegant model’ could provide the miss- 
ing links between a suite of observational and 
theoretical results that have changed our under- 
standing of Saturn’s rings. Such massive rings 
would be less sensitive to the darkening effect 
of meteoroid bombardment*”, thus explaining 
their brightness today. In addition, because of 
their mass they should spread more rapidly 
than the present ones, leading, over the age of 
the Solar System, to lighter rings like those seen 
today”"’. During this spreading, the material 
expanding beyond the Roche limit may have 
given birth to satellites; such a mechanism has 
recently been proposed for the formation of 
Saturn’s small moons". Indeed, observations 
made with the Cassini spacecraft have shown 
that accretion processes are still active at the 
outer edge of Saturn’s rings'”’’ and on the 
satellites Pan and Atlas’. Canup suggests that 
the inner satellites of Saturn, up to and includ- 
ing Tethys, could also have formed by accre- 
tion of spreading ring material beyond the 
Roche limit. 

Canup’s model’ offers, for the first time, a 
convincing starting point for a consistent the- 
ory of the origin of Saturn’s rings and satellites. 
It shows that the rings and satellites are inti- 
mately linked, and that Saturn's system, despite 
being made ofice, is not frozen but is constantly 
evolving. The origin of the rings and satellites 
must be understood in the wider framework 
of models of planet formation, and this work 
is one step in that direction. One may question 
whether the specific conditions required for 
such rings to form have also been met around 
other Solar System planets and exoplanets. The 
details and the consequences of the formation 
of the satellites at the outer edge of the rings, 
and their outward migration to their present 
positions, are still to be explored. Establishing 
these details could change our understanding 
of Saturn’s satellites, and more generally of 
giant planets and their environments. = 


Aurélien Crida is at the Laboratoire 
Cassiopée, Université de Nice Sophia- 
antipolis/CNRS/Observatoire de la Cote 
dAzur — BP4229, 06304 Nice Cedex 4, 
France. Sébastien Charnoz is at 

the Laboratoire AIM, Université Paris 
Diderot/CEA IRFU/CNRS CEA 

Saclay — SAp Centre de Orme Les Merisiers, 


91191 Gif-sur-Yvette Cedex, France. 4. 
e-mail: aurelien.crida@oca.eu 


Harris, A. in Planetary Rings (eds Greenberg, R. & 

Brahic, A.) 641-659 (Univ. Arizona Press, 1984). 

5. Dones, L. Icarus 92, 194-203 (1991). 

6. Charnoz, S., Morbidelli, A., Dones, L. & Salmon, J. 
Icarus 199, 413-428 (2009). 

. Canup, R. M. Nature 468, 943-946 (2010). 

. Canup, R. M. & Ward, W. R. Astron. J. 124, 

3404-3423 (2002). 

Cuzzi, J. N. & Estrada, P. R. Icarus 132, 1-35 

(1998). 


1. Nicholson, P. D. et al. Icarus 193, 182-212 
(2008). 7 

2. Esposito, L. W., O’Callaghan, M. & West R. A. Icarus 8 
56, 439-452 (1983). 

3. Robbins, S. J., Stewart, G. R., Lewis, M. C., Colwell, 9. 
J.E. & Sreméevié, M. Icarus 206, 431-445 (2010). 


The prospects 
for polar bears 


Is the polar bear doomed to extinction? Maybe not, according to models of the 
future extent of Arctic sea ice if greenhouse-gas emissions are curbed. The 
outlook depends on the ability of policy-makers to act. SEE LETTER P.955 


ANDREW E. DEROCHER 


he projected loss of polar bear sea-ice 

habitat as a result of a warming climate 

will dramatically reduce the spatial and 
temporal extent of that habitat by the end of 
the twenty-first century’. Accordingly, demo- 
graphic analyses and population-projection 
models have predicted drastic declines in 
the polar bear population of the southern 
Beaufort Sea’. Combined with dire pre- 
dictions from other regions, these find- 
ings led to polar bears being listed as 
‘Threatened’ throughout their range 
under the US Endangered Species 
Act. However, the listing process 
took no account of the potential 
effects of measures to reduce 
greenhouse-gas emissions, 
and so reduce anthropogenic 
warming. Amstrup et al.’ (page 
955) provide such analyses and 
their results are cause for some 
optimism — if timely action is 
taken. 

Amstrup and colleagues exam- 
ined projection models of sea-ice 
loss based on different greenhouse- 
gas emission scenarios and found that 
mitigation could greatly improve the 
conservation status of polar bears into the 
next century. Reduced emissions, the lower 
the better, would yield greater abundance 
and wider distribution of polar bears than the 
‘business as usual’ emission scenario. Lower 
levels of warming and sea-ice loss would 
improve the conservation outlook not just for 
polar bears, but for other Arctic marine spe- 
cies as well. Nonetheless, most emission sce- 
narios resulted in substantial loss of optimal 
habitats and a major increase in the ice-free 
period over the more biologically productive 


continental shelves. Amstrup ef al. also reaffirm 
earlier findings that emission scenarios with 
high end-of-century levels of carbon diox- 
ide, a major greenhouse gas, will result in 
great loss of polar bears. 

The best-studied polar bear population is in 
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Figure 1 | Distribution of the polar bear 

(Ursus maritimus). In this overlay of the Arctic, 
darker areas denote higher population density, 
lighter areas lower density. Amstrup and colleagues’ 
model projections’ show that implementation 

of measures to slow global warming improve 

the prospects for polar bears. But even with such 
measures, various factors could still conspire to 
make their future gloomy. 
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western Hudson Bay, Canada, and it has shown 
a slow decline due to the lower survival and 
reproduction that is correlated with reduced 
sea-ice duration’. The possible occurrence of 
‘tipping points’ (where rising temperatures 
trigger a feedback loop that further drives ice 
loss”), and the summer ice minima’ of 2007, 
sparked concerns of sudden and irreversible 
loss of polar bear habitat. Catastrophic shifts 
have been noted in many different ecosystems’, 
and in ice-covered Arctic marine ecosystems 
continued melting of the sea ice could produce 
episodic changes in predator-prey dynamics 
and rapid restructuring of the food web’. 

Amstrup and colleagues’, however, found no 
evidence of a critical temperature that resulted 
in a tipping point. Rather, they found a linear 
relationship between global mean surface air 
temperature and sea-ice extent, and that rapid 
ice loss could partially reverse. But the authors 
caution that tipping points could occur in the 
real world and that the situation could still be 
dire. Although Arctic marine mammals are 
well adapted to fluctuating environments, 
and can tolerate substantial inter-annual 
change, analyses of polar bear ener- 
getics show that rapid ice loss could 
trigger strong nonlinear declines in 

survival’. Given the low reproduc- 
tive rates of these animals, even 
episodic loss of sea ice followed 
by recovery could have serious 
effects. 

Amstrup et al. also note that 
the best possible outcomes for 
polar bears include control- 
ling hunting and other factors 

inan effort to make populations 

with the expected lower numbers 
sustainable. But a ban on hunting 
would be a serious cultural loss for 
the Arctic’s aboriginal people. There 
are also other caveats. The authors warn 
that a lack of data across the full polar 
bear range (Fig. 1) makes predictions of 
future abundance tenuous. Further, they stress 
that the relationships between demographics 
and sea ice are only partly understood, and 
that geographical differences in population 
response to even linear declines in sea ice 
may vary. 

Increased risk of extinction is often associ- 
ated with specialization”®. So it is not surpris- 
ing that polar bears — as large predators that 
rely on sea ice as the platform on which to hunt 
their seal prey, to mate and to travel — are 
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especially vulnerable. This paper’ provides 
reason to hope that the previous predictions 
of declines in polar bear populations can be 
avoided if concerted efforts are made to reduce 
greenhouse-gas emissions. The threat posed by 
climate change to biological diversity has been 
clear for years, however, and calls for carbon 
sequestration and reduction of emissions to 
conserve species’ have largely gone unheeded. 
Amstrup and colleagues describe polar bears 
as sentinels of Arctic marine ecosystems and 
emphasize the importance of sea-ice habi- 
tats in the global climate system: both would 
benefit from greenhouse-gas reduction. 
There are few indications, however, that 
such policies will be implemented in a timely 


MATERIALS SCIENCE 


manner. Globally, 25% of mammalian species 
are threatened with extinction — with habitat 
loss and degradation being the main causes”. 
In this context, the plight of polar bears is 
sadly typical. Their future remains uncer- 
tain, but it is now more clearly in the hands of 
policy-makers. There is cause for optimism, 
but that requires optimism about our ability 
to change. m 
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Pleated crystals 


Aneat study that involves placing colloidal particles on curved oil-glycerol 
interfaces reveals a new form of crystal defect. The defect is called a pleat, by 
analogy to the age-old type of fabric fold. SEE LETTER P.947 


FRANCESCO STELLACCI 
& ANDREAS MORTENSEN 


a simple yet brilliant idea‘: to model atoms 

using bubbles. Bubbles are neatly spherical 
yet soft, and if many small bubbles, all of the 
same size, are blown onto the surface of a soapy 
film, they assemble into a hexagonal array 
(raft) that mimics the atomic arrangements 
of flat, two-dimensional crystals. On page 947 
of this issue, Irvine and colleagues” present 
an extension of the ‘bubble-raft technique’ to 
visualize the behaviour of two-dimensional 
crystals along curved surfaces. 

Bubble rafts are ideal tools for studying crys- 
tals. They are imperfect: exactly like crystals, 
they can contain defects such as vacancies, 
impurities, dislocations and grain boundaries. 
In two-dimensional crystals, dislocations are 
point defects (line defects in three dimensions) 
formed by the termination ofa row or column 
(a plane in three dimensions) of periodically 
aligned atoms (Fig. 1). Dislocations are impor- 
tant because, if they move, they shift matter 
by one atomic-lattice spacing about their tra- 
jectory; in three dimensions, dislocations are 
the main mechanism by which metals deform 
permanently. They are also important in crys- 
talline materials because arrays of dislocations 
form the building blocks of subgrain bounda- 
ries — grain boundaries separating grains that 
are misoriented by only a few degrees’. 

The question of whether dislocations can 
drive the irreversible deformation (plasticity) 
of materials had been controversial, but was 
vividly addressed by Bragg and Nye’s original 


E 1947, William Bragg and John Nye had 


906 | NATURE | VOL 468 | 16 DECEMBER 


bubble rafts: when stress was applied to the 
rafts, dislocations were seen to move and 
deform them. Movies of bubble rafts offer one 
of the most eloquent visualizations of disloca- 
tions, and are still used today to tackle subtle 
problems in the plastic deformation of both 
crystalline and amorphous materials*. They 
can be seen on YouTube, and are often used to 
illustrate the nature and significance of defects 
in crystals to students. 

Meanwhile, the bubble-raft technique 
has been extended to three dimensions, and 
has been widely used to address problems in 
materials science ranging from melting and 
other phase changes in crystalline materials*” 
to the nature of the glassy state’. In these stud- 
ies**, the three-dimensional bubble rafts are 
not arrangements of actual bubbles, but instead 
consist of colloidal crystals, and specialized 
microscopic techniques are used to study their 
behaviour. Yet, to date, the bubble-raft princi- 
ple has seldom been used to study the interplay 
between the periodicity of a two-dimensional 
crystal’s lattice structure and the curvature of 
the surfaces on which it is laid. 

In two-dimensional bubble rafts, gravity 
keeps the bubbles (or crystals) within the flat, 
liquid-solution surface over which the bubbles 
are laid. If the bubble raft were transported into 
space, with gravity now absent, several things 
would happen. One is that dislocations in the 
bubble raft might buckle. In one of Bragg and 
Nye’s original rafts, an extra line of bubbles ter- 
minating at a dislocation is visible, and causes 
the surrounding bubbles to be squeezed against 
one another (Fig. 1). This squeezing results in 
an increase in the internal elastic energy stored 
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Figure 1 | The bubble raft. This image shows 
one of Bragg and Nye’ original bubble rafts’, 
assemblies of bubbles on the surface of a solution 
consisting of water, glycerine, oleic acid and 
triethanolamine that can be used to model two- 
dimensional crystals. Every other horizontal 
bubble row has been coloured green to facilitate 
the identification of the column of bubbles that 
terminates at a dislocation (red). The bubbles 
that form the bottom of this column (and hence 
the dislocation) are two opposite disclinations, as 
predicted by theory. They have a different number 
of nearest neighbours from bubbles elsewhere: 5 
and 7 rather than 6. Horizontal bubble rows tilt 
downwards slightly to the right of the dislocation. 
An array of similar dislocations, regularly spaced 
one over the other, creates a boundary across 
which the crystal is tilted by a constant angle: this 
is a two-dimensional subgrain boundary. Irvine 
and colleagues’ study the interaction between the 
curvature of a two-dimensional crystal and the 
disclinations or dislocations it contains, and find 
short subgrain boundaries, which they call pleats. 
(Image reproduced from ref. 1.) 


in the raft, which can be relieved if bubbles at 
the top of the dislocation move out of the plane 
of the raft”*. Such buckling in turn curves the 
raft’s surface. In addition, a bubble raft in 
space will curve more globally, because the 
liquid-solution surface can seek a shape that 
minimizes its total surface energy: the surface 
becomes globally curved. This creates interest- 
ing complications for the way in which bubbles 
pack, because in such two-dimensional arrays 
there is an interplay between surface curvature 
and defects. 


R. SOC. 


Such interplay is well known for disclina- 
tions. These are points in the crystal where 
the rotational lattice symmetry is broken (for 
dislocations, it is the translational symmetry 
that is disrupted). At a disclination, an atom 
has a different number of nearest neighbours 
from atoms elsewhere (Fig. 1). Disclinations 
come with much lattice deformation, and 
hence lead to an excess of elastic energy in 
the crystal. This energy can be relieved, as for 
dislocations, by curvature in the surface of the 
substrate on which the atoms are packed. As a 
result, disclinations and substrate-surface cur- 
vature interact. It is this interaction between 
curvature and disclinations, or other crystal 
defects in general, that Irvine and colleagues’ 
investigate in their study. 

It is not necessary to go into space for 
gravity’s effect to vanish and curve the sur- 
faces: all that is needed is a sufficiently small 
system. But producing small systems that 
can be used to study the physics of two- 
dimensional crystals on curved surfaces is 
challenging. Irvine and co-workers cir- 
cumvented this problem by using colloidal 
particles on curved oil-glycerol interfaces 
to visualize real, two-dimensional crystals 
on curved surfaces. They find that the crys- 
tals display both five- and seven-neighbour 
disclinations in curved regions, just as pre- 
dicted by theory’.What’s more, they show 
that two-dimensional subgrain boundaries 
can form on curved surfaces; they call these 
pleats, by analogy to the age-old type of 
skirt. And here, too, they find a correlation 
with curvature: the pleats’ presence, exten- 
sion and position depend on the curvature. 
This interplay between pleats and curvature 
makes the nature and role of pleats interest- 
ing, potentially opening up new avenues to 
explaining some of the many phenomena, 
both natural and man-made, that are governed 
by the physics of two-dimensional curved 
crystals. m 
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Angelman syndrome 


connections 


Neuronal networks in the brain that develop early in life underlie our ability to 
learn, remember and communicate. Genetic defects that perturb the fine-tuning 
of such neuronal connectivity can cause disease. 


PETER SCHEIFFELE & ASIM A. BEG 


espite remarkable progress in under- 
D standing the cellular basis of neuro- 
developmental disorders, their genetic 
causes have proved difficult to identify. One 
such example is Angelman syndrome — a 
severe disorder that is characterized by 
reduced intellect, epileptic seizures, poor 
language skills, and unusual laughter and 
happiness’. Although the pathogenesis remains 
to be clearly defined, there is unequivocal 
genetic evidence that disruptions in the UBE3A 
gene can cause Angelman syndrome’. Two 
papers** in Cell show that, in mice, the UBE3A 
protein regulates several aspects of neuronal 
connectivity and function. 
Genetically engineered mice lacking a 
functional Ube3A gene represent a model for 


a Normal mice 


Degraded 
proteins 


Angelman syndrome. These mice recapitulate 
several neurological traits observed in human 
patients, including learning deficits and spe- 
cific alterations in neuronal circuitry*. The 
formation of synaptic connections between 
individual neurons in the brain initially 
appears normal. However, the refinement of 
synaptic connectivity is disrupted, leaving 
neuronal networks with reduced numbers of 
dendritic spines — the neuronal structures 
that harbour most synapses*. The neuronal 
networks of the mutant mice also exhibit an 
unusual rigidity (or lack of plasticity), which 
is thought to underlie the cognitive alterations 
in patients and their inability to learn relatively 
easy tasks. 

These disease-related defects highlight a 
fundamental question of neurodevelop- 
ment: how do neurons specify and control 
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Figure 1 | UBE3A and excitatory synapses. a, In normal neurons, UBE3A tags substrate proteins Arc* 
and Ephexin-5 (ref. 3) with ubiquitin, targeting them for proteasomal degradation. Decreasing the levels 
of Arc and Ephexin-5 is a key determinant of dendritic-spine density, maturation and synaptic function. 
b, In Ube3A-deficient neurons, Arc and Ephexin-5 accumulate. High Ephexin-5 levels limit the growth 
and maturation of dendritic spines by promoting RhoA activity. Increased Arc levels enhance removal of 
the AMPA-type glutamate receptors, leading to a decrease in synaptic function. 
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the number of connections that they form? 
In other words, are there brakes that prevent 
premature and overabundant connectivity at 
early developmental stages? And how might 
such brakes be removed for synapse formation 
to proceed? 

Functional analysis of UBE3A has pro- 
vided some clues with which to answer these 
questions. This protein is an E3 ubiquitin 
ligase — an enzyme that transfers the small 
protein ubiquitin onto substrate proteins. Once 
tagged, these substrates are degraded by the 
proteasome complex and are removed from 
the cell. UBE3A thus controls the levels of 
specific substrate proteins’. 

In Angelman syndrome, the neuronal level 
of UBE3A substrate proteins is abnormally 
high, causing perturbed brain development 
and function’. Although some UBE3A target 
proteins have been identified, the key sub- 
strates responsible for the development of 
Angelman syndrome have remained obscure. 
Margolis et al.’ and Greer et al.* identify two 
UBE3A substrates that may explain sev- 
eral aspects of neuronal dysfunction in this 
disorder. 

Margolis and colleagues provide compelling 
evidence that UBE3A ubiquitinates a neu- 
ronal signalling molecule called Ephexin-5 
(Fig. 1a). Levels of Ephexin-5 are increased in 
mouse models of Angelman syndrome. Ina 
series of experiments, the authors probe what 
happens to neuronal circuits when cells express 
either too much or too little Ephexin-5. They 
find that this protein restricts synaptic devel- 
opment. During early postnatal life, Ephexin-5 
levels are high and keep the number of synaptic 
connections low, so that neurons do not indis- 
criminately form synapses with any cell that 
they contact. 

Ephexin-5 blocks synapse formation by acti- 
vating the signalling protein RhoA — a GTPase 
that limits the growth of dendritic spines. 
However, when two neurons interact, activa- 
tion of the signalling receptor EphB promotes 
phosphorylation of Ephexin-5. Phosphor- 
ylated Ephexin-5 is subsequently ubiquinated 
by UBE3A, tagging it for degradation. Once 
Ephexin-5 is removed, the establishment of 
new synaptic connections can proceed. How- 
ever, in Angelman-syndrome neurons UBE3A 
is absent, allowing Ephexin-5 to accumulate; 
consequently, the cells exhibit increased 
RhoA activity and form too few synapses 
(Fig. 1b), resulting in dysfunction of neuronal 
networks. 

Whereas Ephexin-5 controls synapse 
number, Arc — the other UBE3A target, which 
Greer et al.* identified — controls synapse 
function and plasticity. Specifically, Arc pro- 
motes the removal of synaptic receptors that 
respond to the neurotransmitter glutamate; 
this results in either a reduction or silenc- 
ing of synaptic communication®. Greer and 
colleagues discovered that UBE3A binds to 
and ubiquitinates Arc, thereby promoting its 
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degradation. In UBE3A-deficient neurons, Arc 
accumulates, leading to a decrease in synaptic 
levels of glutamate receptors and a disruption 
of normal synaptic function (Fig. 1). 

The discovery of two UBE3A substrates 
provides a mechanistic explanation for at least 
some aspects of Angelman syndrome at a 
cellular level. Dysregulation of Ephexin-5 and 
Arc represent potential causes of alterations in 
synapse number and plasticity, respectively, 
both of which are hallmarks of this disorder. 

Notably, these findings may also have 
implications for autism. Some patients with 
autism carry a duplication of the genomic 
region 15q11-13, which contains, among 
other genes, Ube3A. This mutation results 
in increased expression of UBE3A and other 
proteins’. Extrapolating from the data of 
Margolis et al.? and Greer et al.*, increased 
UBE3A levels result in reductions in Ephexin-5 
and Arc levels, thereby increasing neuronal 
connectivity and synaptic transmission. These 
hypotheses can be tested in recently generated’ 
mouse models carrying a duplication of this 
genomic region. 

These fresh insights** into neuronal dys- 
function in Angelman syndrome also point 
to candidate drug targets. For example, 
some of the traits associated with the Ube3A 
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mutation in mice can be reversed by activa- 
tion of calmodulin-dependent kinase 2, a 
signalling protein that has been implicated in 
promoting the delivery of glutamate receptors 
to synapses*. The new findings provide a first 
step in the attempt to alleviate the neuronal 
defects of Angelman syndrome, as well as 
related neurodevelopmental disorders such 
as autism. m 


Peter Scheiffele is in the Biozentrum of the 
University of Basel, 4056 Basel, Switzerland. 
Asim A. Beg is in the Department of 
Pharmacology, University of Michigan, 

Ann Arbor, Michigan 48109, USA. 

e-mails: peter.scheiffele@unibas.ch; 
asimbeg@umich.edu 


1. Williams, C. A. et a/. Am. J. Med. Genet. A 140, 
413-418 (2006). 

2. Nicholls, R. D. & Knepper, J. L. Annu. Rev. Genomics 
Hum. Genet. 2, 153-175 (2001). 

. Margolis, S. S. et al. Cell 143, 442-455 (2010). 

. Greer, P. L. et al. Cell 140, 704-716 (2010). 

. Jiang, Y. et al. Neuron 21, 799-811 (1998). 

. Chowdhury, S. et al. Neuron 52, 445-459 (2006). 

. Nakatani, J. et al. Cell 137, 1235-1246 (2009). 

. van Woerden, G. M. et al. Nature Neurosci. 10, 
280-282 (2007). 


ONADORW 


P.S. declares competing financial interests. 
See online article for details. 


Model’s reputation 


restored 


The structure of a mineral has been validated, ending the controversy about its 
potential usefulness as a model of an unusual magnetic lattice. This model might 


provide insight into superconductivity. 


MARK A. DE VRIES & ANDREW HARRISON 


he mineral herbertsmithite has been 

hailed as a rare model of an uncon- 

ventional type of magnetism thought 
to have a key role in the mechanism of high- 
temperature superconductivity (a form of 
superconductivity that occurs above 30 kelvin). 
But concerns have been raised that chemical 
disorder in this material could produce defects 
in the array of magnetic atoms, disturbing or 
even destroying the properties that make it a 
useful model. In the Journal of the American 
Chemical Society, Daniel Nocera and his group’ 
now report that they have clearly identified the 
type of disorder present, and have found it to 
have little influence on the magnetic lattice of 
the mineral. This represents a crucial step in 
establishing a simple, clean system to provide 
unambiguous insight into this important form 
of magnetism. 


2010 
© 2010 Macmillan Publishers Limited. All rights reserved 


A material's magnetism is ultimately derived 
from unpaired electrons, each of which has a 
property called spin (S), with a value of %, that 
bestows on each of them a magnetic moment. 
In insulators, such moments or spins are local- 
ized on atoms, and commonly interact with 
their closest neighbours so that specific spin 
orientations are preferred. In most cases, an 
antiparallel (antiferromagnetic) configura- 
tion of nearest-neighbour spins is favoured 
so that, when a lattice of such atoms is cooled, 
their spins usually freeze to form an ordered 
array (Fig. la). However, under certain cir- 
cumstances — for some magnetic lattices, for 
instance — quite different behaviour can ensue. 
One such case is the kagome antiferromagnet, 
which is formed from corner-sharing triangles 
(Fig. 1b). In these systems, it is impossible to 
arrange each near-neighbour pair of spins so 
that they are all antiparallel, and the system is 
said to be geometrically frustrated. 


The lowest-energy arrangement of a kagome 
antiferromagnet might be expected to be one 
in which neighbouring spins are oriented 120° 
to each other (Fig. 1c). However, the physicist 
Philip Anderson pointed out that, for several 
lattices composed of triangles’ in which there is 
just one electron per atom (that is, when S=%), 
it is more favourable for neighbouring spins to 
pair up in a manner analogous to that of the 
electron spins in chemical bonds. Quantum 
mechanics allows the spins in ‘valence’ bonds 
to be simultaneously up-down and down-up 
(where ‘up’ and ‘down are the two possible 
alignments of the spins), thus relieving geo- 
metric frustration by effectively allowing pairs 
of local moments to cancel out. 

Because such coupling may occur between 
all pairs of spins, the overall picture is that ofa 
‘liquid’ of valence bonds, a state that resonates 
between all possible ways of making such 
bonds (Fig. 1d). This state is called a quantum 
spin liquid, or a resonating valence bond (RVB) 
liquid (by analogy with Linus Pauling’s model? 
of chemical bonding in organic molecules such 
as benzene, in which the chemical structure can 
also be described as a hybrid of different valence- 
bond arrangements). It was proposed* — 
again by Anderson — that in the cuprate 
class of high-temperature superconductors, 
the RVB state enables the formation of 
superconducting charge carriers. 

Among lattices of triangles, kagome lattices 
that have antiferromagnetically coupled spins 
of S= % have been regarded for some time as 
prime candidates for a RVB state. But an undis- 
torted realization of this highly prized system 
— herbertsmithite, which has the nominal 
formula ZnCu,(OH),Cl, — was only recently 
reported’ by Nocera’s group. To be precise, it 
is the array of spins on the copper ions (Cu”*) 
in herbertsmithite that could form an RVB 
magnetic state (Fig. le). 

It was soon demonstrated‘ that, although 
there is a very strong antiferromagnetic 
coupling between spins in the mineral, no spin 
freezing could be observed, even at tempera- 
tures as low as 50 millikelvin. This is consistent 
with the existence of some form of RVB state in 
herbertsmithite. But evidence was also found 
for disorder in the compound's structure, rais- 
ing concerns that it was not as clean a model 
system as had been hoped. Neutron-diffrac- 
tion data, combined with elemental analy- 
sis using inductively coupled plasma Auger 
electron spectroscopy (ICP-AES) on herbert- 
smithite indicated that about one-quarter of 
the sites expected to be occupied by zinc ions 
(Zn) are actually occupied by Cu” (Fig. le), 
and that one-twelfth of the Cu sites on the 
kagome lattice are occupied by Zn**. The 
actual level of disorder was estimated to be 
slightly lower, however, on the basis of analy- 
ses of the compound’s magnetic susceptibil- 
ity’ and heat capacity*. Both of these properties 
contain a contribution from nearly free spins 
(which interact only very weakly with other 
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Figure 1 | Spin arrangements in crystal lattices. Unpaired electrons on atoms have a magnetic 
moment, or spin. These spins adopt preferred alignments (indicated by red arrows) in crystal lattices. 

a, Antiferromagnetic alignments, in which all neighbouring spins are antiparallel (as in this square 
lattice) are often favoured. b, In kagome lattices, triangles of atoms are joined at their corners, making 

a completely antiparallel spin arrangement impossible. c, This can lead to a compromise arrangement 

in which the spins are oriented at 120° to each other. d, Alternatively, a quantum spin liquid might form 
as a combination of many states (one of which is depicted here) in which spins pair up, analogous to 

the pairing of electrons in chemical bonds. e, The mineral herbertsmithite (ZnCu,(OH),CL,) contains 
kagome layers of Cu” ions (blue) linked by O* ions (red), separated by layers of Zn”* (grey) and CI ions 
(not shown). Defects occur in the lattice when Cu™ ions occupy Zn” sites and vice versa (indicated by the 
double-headed arrow between atoms marked ‘X’ and ‘Y’). Nocera and colleagues' find that the number 
of defects in herbertsmithite is smaller than had been thought. This makes the mineral an ideal model for 
studying spin-liquid states, which have been implicated in high-temperature superconductivity. 


spins in the material). This contribution 
can be attributed to the S=% spins of Cu* 
ions occupying one-fifth of the Zn” sites. 

Nocera and colleagues appreciated that the 
uncertainty about the disorder was unsatisfac- 
tory — they felt that rigorous determination 
of the exact level of disorder should depend 
on structural and chemical methods alone. 
They therefore performed’ new measure- 
ments on herbertsmithite, using a suite of 
techniques: X-ray absorption spectroscopy 
and advanced X-ray- and neutron-diffraction 
methods. Taken together, their results show 
that about 15% of the Zn”* sites are occupied 
bya Cu”* ion, thus confirming the earlier 
interpretations of the magnetic susceptibil- 
ity’ and heat capacity* of herbertsmithite, and 
convincingly dispelling the last doubts about 
the concentration of Cu” ions (and so of S= % 
spins) on the Zn” sites. These defect spins will 
interact only weakly with other spins, and thus 
will have very little influence on the behaviour 
of the spins on the kagome lattice. 

What comes as a surprise in Nocera and 
colleagues’ results’ is that the concentration of 
Zn* ions in the kagome Cu” sites was found 
to be only about 1 +3%. This is good news for 
those investigating quantum magnetism — it 
means that there are very few vacancies in the 
periodic array of spins forming the kagome lat- 
tice, allowing for a reliable comparison between 
theory and experiment in herbertsmithite. The 
lower than expected chemical disorder implies 
that the chemical formula of herbertsmithite 


is probably closer to Zn ,;Cu; };(OH).CL, 
rather than ZnCu,(OH),CL,, as was thought 
previously. This revised formula will doubt- 
less be checked in other laboratories using the 
ICP-AES method of chemical analysis. 

As the first RVB system in which spin corre- 
lations and dynamics can readily be measured, 
herbertsmithite has great potential to reveal 
the character of spin liquids. More broadly, it 
should also allow an exploration of the rela- 
tionship between antiferromagnetism and 
superconductivity in layered transition-metal 
compounds. = 
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Emerging properties of animal gene 


regulatory networks 


Eric H. Davidson! 


Gene regulatory networks (GRNs) provide system level explanations of developmental and physiological functions in the 
terms of the genomic regulatory code. Depending on their developmental functions, GRNs differ in their degree of 
hierarchy, and also in the types of modular sub-circuit of which they are composed, although there is a commonly 
employed sub-circuit repertoire. Mathematical modelling of some types of GRN sub-circuit has deepened biological 
understanding of the functions they mediate. The structural organization of various kinds of GRN reflects their roles in 
the life process, and causally illuminates both developmental and evolutionary process. 


ment, is a property of its species and is thus encoded in the 

genome. Embryonic development is an enormous informational 
transaction, in which DNA sequence data generate and guide the sys- 
tem-wide spatial deployment of specific cellular functions. GRNs also 
determine the main events of postembryonic development, including 
organogenesis and formation of adult parts and cell types. Beyond that, 
GRNs control a vast array of physiological capabilities and modes of 
response to environmental fluctuations and challenges. GRNs are com- 
posed of multiple sub-circuits, that is, the individual regulatory tasks 
into which a process can be parsed are each accomplished by a given 
GRN sub-circuit’*. Thus the operational significance of a GRN struc- 
ture will be indicated by the types of sub-circuit it contains. However, 
GRNs have more global organizational properties as well. The compar- 
ative review below shows that GRNs may be deeply layered, generating 
successive regulatory transactions, or they may be shallow, in the sense 
that they mandate few transactions between the initial inputs and the 
terminal activation of effector genes. 


T he body plan of an animal, and hence its exact mode of develop- 


The developmental GRN sub-circuit repertoire 


Modular GRN sub-circuits are defined by their topologies, and the 
topology of a sub-circuit directly indicates its function in life. In this 
article I am concerned only with sub-circuits which perform devel- 
opmental biology jobs that can be defined uniquely, and not with very 
common ‘motifs’ such as the coherent feed forward loop, which 
although it has specific dynamic properties’, appears in so many differ- 
ent contexts that no unique developmental biology function can be 
associated with it. Table 1 contains a compilation of sub-circuits drawn 
from all the various GRNs considered in this review, together with an 
abbreviated description of their regulatory functions, and abbreviated 
diagrams illustrating the canonical sub-circuit structures. Additional 
sub-circuits will be found as more developmental GRNs are explored, 
but the basic import of Table 1 is that there probably exists a small, finite 
number of sub-circuit topologies out of which developmental programs 
of all kinds are constructed. The first entry, for example, is a spatial 
information processing sub-circuit called the double-negative gate, 
found in the sea urchin embryo GRNs”*. This sub-circuit consists of 
two genes encoding repressors wired in tandem, so that the target of the 
first repressor is the gene encoding the second, plus downstream reg- 
ulatory genes which are targets of the second repressor. Its function is to 
ensure that the target genes are expressed only where the first repressor 


is (transiently) active (domain X), while these genes are shut down 
everywhere else (1 — X); what we term an X,1 — X processor. 

References in Table 1 generalize the point that structurally similar 
sub-circuits, but composed of different regulatory genes, are repeatedly 
encountered doing similar developmental jobs in different GRNs. At 
root this is because what the circuit can do depends directly on its 
structure; for example, in a recent study, a search of all possible small 
sub-circuits based on 3-node topologies showed that only two are cap- 
able of response to a signal followed by return to the original state’. 

A given sub-circuit structure implies a given function, and in develop- 
ment there is a finite set of regulatory functions required. The biochem- 
ical complexity of the diverse cis-regulatory systems composing 
developmental GRN sub-circuits, and the diversity of the sets of tran- 
scription factors which animate them, thus may give way to a pleasingly 
simple set of logic-processing sub-circuit topologies. This would be a 
very important outcome, for it would make it possible to parse the 
apparently enormous mazes of interconnections in system level GRNs 
into modules of developmental logic function, and thus to understand 
how GRNs control the biology we see. 


Deep structure of embryonic GRNs 


The GRNs which control the de novo formation of embryonic territories 
typically include many different functional sub-circuits, which govern 
successive ‘layers’ of process. They are hierarchical in their overall struc- 
ture. Their depth simply reflects the long sequence of regulatory steps 
required to complete any component of embryonic development. The 
concept of deep as opposed to shallow GRN structure can be simply 
considered as the number of successive changes in regulatory state 
required to generate an episode of embryological or other development, 
between the initial state and the terminal process which the GRN causes 
to happen. The terminal outcome is, by definition, the activation of 
cohorts of effector genes (that is, differentiation and cell biology genes, 
as opposed to only regulatory genes). In relatively shallow GRNs, some of 
which are considered below, the initial state may be a paused regulatory 
condition just upstream of expression of a differentiation gene battery. 
The sea urchin embryo endomesoderm GRN serves as a reference 
point, as at present it is the most nearly complete, predictively useful, 
and validated large scale developmental GRN available. Structure/func- 
tion aspects of this GRN have been reviewed recently’, and an always 
current version, together with underlying data and dynamic presenta- 
tions by domain, is available at http://sugp.caltech.edu/endomes/. The 
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Table 1 | Sub-circuit repertoire for developmental GRNs 
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concept of GRN depth is illustrated in Fig. 1a by abstracting from the sea 
urchin GRN the sequence of sub-circuits deployed in order to specify its 
skeletogenic cell lineage’, which produces only the one cell type, and is 
developmentally the simplest process modelled in the whole endome- 
soderm GRN. This portion of the network contains 24 regulatory genes 
and several signalling genes, as well as a sampling of downstream dif- 
ferentiation genes Without detailing the individual genes and linkages in 
the skeletogenic GRN, its internal structure is abstractly represented in 
Fig. la as a series of interconnected boxes, each of which represents a 
GRN sub-circuit that executes the indicated regulatory task. Many of 
these sub-circuits are among the types listed in Table 1, as indicated by 
the colour coding, and the arrows show the linkages from one sub- 
circuit to another, that is, they represent transcription factors generated 
in one box and used for control of gene(s) in the next box, the inputs or 
‘feeds’ into each sub-circuit. The boxes are layered hierarchically, with 
those that initiate the process at the top. Figure la includes various 
control processes that are common throughout embryonic develop- 
ment, because the problems that have to be solved are general: the initial 
spatial inputs have to be interpreted, the regulatory state then has to be 
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locked down (the initial inputs are always transient), signals have then to 
be generated, other states have to be excluded, and differentiation dri- 
vers have to be activated. It is not surprising that all this requires a lot 
of sequential circuitry, even given the relative simplicity of skeleto 
genic lineage development. The GRNs underlying specification of the 
mesoderm, endoderm!'”* and of the oral and aboral ectoderm’ of the sea 
urchin embryo are similarly deep, layered and hierarchical. 

GRNs for specification of mesoderm in Xenopus embryos", for spe- 
cification of the gut and mesoderm cell lineages in Caenorhabditis ele- 
gans embryos", and for specification of endoderm and dorsal, anterior 
and ventral gene expression domains"’, and of mesoderm” in zebrafish, 
also display deep, hierarchical organizations. Some of these GRNs are 
compilations from the literature or from chromatin immunoprecipita- 
tion (ChIP)-chip observations, and have not been validated by direct 
perturbation analysis, let alone at the cis-regulatory level, but it is 
unlikely that their overall structure is illusory. 

The GRN for dorsal/ventral patterning in Drosophila”, which does 
have extensive cis-regulatory support, is also hierarchical, but its unusual 
structure reflects the unusual developmental process it controls. In this 
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Table 1 | Continued 
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The role of the sub-circuit is given in column 1; its name in column 2; a description of its function in column 3; and the sub-circuit structure in column 4. Numbers in column 2 are keyed to Fig. 1. See references 
indicated for actual occurrences, exact circuit topologies, and discussion of information processing specifics. In each case the functions of the circuit are hardwired in its cis-regulatory target sites. In Topologies 
column, all genes encode transcription factors unless otherwise noted. 

* Regulatory genes that create initial regulatory state are controlled by widely expressed repressor, which is dominant over their positive inputs, and gene encoding this repressor is itself specifically repressed ina 
local region (X) by another gene encoding a different repressor: hence target genes are ON in X, specifically repressed elsewhere. 

Many developmental signalling systems (for example, Notch, Wnt) activate immediate early response factors in cells receiving ligand, butin absence of ligand, these factors act as dominant repressors of the same 


target genes. 
{ Dynamic in that continuing transcription is required. 
§ Exclusion sub-circuits are activated as downstream outputs of specification GRNs. 


{| From ref 2. 

#L, gene encoding signalling ligand. 

vx R encodes repressor; L encodes signalling ligand. 
** This example was adapted from Ref 36. 


HS, signal; triangle represents graded signal strength 


fAutoregulatory loops lock on whichever state the system goes to. 


embryo regulatory state domains are initially set up very quickly in a 
syncytium, without intercellular signalling, although following cellular- 
ization, signalling dominates further transcriptional functions and the 
GRN henceforth has a typical structure. In the syncytial embryo spatial 
stripes of both dorsal/ventral and anterior/posterior regulatory gene 
expression, which specify the future multicellular embryonic territories, 
are generated by parallel cis-regulatory responses to maternally localized 
and zygotically expressed combinations of diffusing transcription fac- 
tors'*"!”. Many of these factors act as repressors in setting spatial bound- 
aries. Important initial inputs in this system are the transcription factors 
Dorsal and Bicoid, encoded by maternal messenger RNAs which 
become distributed in graded fashion in the syncytial embryo nuclei, 
from ventral to dorsal and anterior to posterior, respectively. Cis-reg- 
ulatory modules have been isolated that control target genes expressed 
in stripes at given ranges of values of these ‘morphogens’. When assoc- 
iated with reporters, and introduced into the egg, these cis-regulatory 


| A unique circuit design here is that the ligand gene is activated by the same signal transduction mechanism reception of the ligand activates in recipient cells; a positive intercellular feedback. 


+ Conceived as a means of obtaining different discrete transcriptional responses from a graded signal; see discussion of this type of circuitry in section on mathematical models below. 


88S), So, different signal inputs gene B is subject to additional transcriptional repression in certain regulatory states. 
||| This design precludes necessity for ad hoc Hill coefficients as in 5, 6.1; see section on mathematical models below. 


modules produce stripes of expression at the appropriate positions along 
the respective axes. It has been assumed for a long time that the positions 
where they operate are determined by the quantitative values of the Dorsal 
or Bicoid concentrations at those locations, which in some way these cis- 
regulatory systems read. Much recent evidence, however, shows that the 
positions where these cis-regulatory modules act depend on combinatorial 
activator and repressor inputs'*”, and the quantitative values of these 
‘morphogens’ alone do not by themselves predict the spatial expression 
domains of their target genes (except sometimes at the extreme positions 
where their concentrations are highest). The combinatorial inputs are the 
products of regulatory genes linked into the GRN”*. Thus the concept that 
the position of target gene expression is determined solely by the quanti- 
tative value of the ‘morphoger is overly simplistic: the overall pattern, and 
the overall signal strength response mechanism, are actually network 
properties rather than a property of individual cis-regulatory modules that 
independently and quantitatively read single gradient values. 
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Figure 1 | ‘Birdseye’ views of structural properties of representative 
developmental GRNs. a-c, Diagrammatic view of sub-circuits and sub-circuit 
functions in three different GRNs. Each box represents a GRN sub-circuit 
consisting of a small number of regulatory genes and their functional linkages. 
Coloured dots and numbers refer to the similarly coded sub-circuit types in 
Table 1. Red arrows indicate linkages between sub-circuits, that is, regulatory 
feeds from one sub-circuit to another. a, GRN for skeletogenic mesoderm 
lineage specification in sea urchin embryos’. b, GRN for pancreatic 
developmental process’, leading to B cell specification and insulin gene 
transcription. c, GRNs typical of terminal binary fate choices in haematopoietic 
stem cells and other similar situations, as discussed in text. 


Vertebrate embryos also use graded inputs, but in this case the devel- 
opmental systems are cellular, and the ‘morphogens'’ are diffusible extra- 
cellular signal ligands. Distinct regulatory responses occur, dependent 
on the intensity of signalling, resulting in activation of different genes in 
different locations in the embryo, for example in response to an activin 
gradient in the pre-gastrular Xenopus embryo”. Modelling shows that 
this particular level-sensitive response could be mediated by a specific 
type of GRN sub-circuit, in which regulatory genes encoding repressors 
reciprocally damp each other’s expression, while responding differenti- 
ally in a cooperative, thus nonlinear and discontinuous way, to the 
concentration of the signal (Table 1). However, as we see in the following 
there is more than one type of sub-circuit capable of discontinuous 
response to a graded signal. 


Structure of GRNs encoding body parts 


We now have bits and pieces of the GRNs controlling development of 
body parts and organs in later embryonic development, usually their 
initial stages. Like embryonic GRNs, they are deep and hierarchical, and 
indeed are in most ways structured similarly to the early embryo GRNs. 
The box diagram cartoon in Fig. 1b provides an example. This diagram 
is abstracted from a GRN for specification of pancreas and then of 
pancreatic f-cells*. In adult body part formation, including organogen- 
esis, the first step is always establishment of a given regulatory state in 
the field of cells from which the body part will form, the progenitor 
field”, for example, the cardiac crescent, or the limb bud, or the imaginal 
disc. The progenitor field is positioned with respect to the coordinates of 
the developing organism, which always involves signal-mediated 
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installation of a new regulatory state. But then the field is subdivided 
into the regulatory state domains of its subparts, and at each step the 
state is locked down. This used to be called ‘pattern formation’, when 
people were looking at only one or a few genes at a time. As in early 
embryo GRNs the main job in setting up the parts and future form of the 
organ is the progressive deployment of regulatory states in space. It is 
essential to realize that this process is not to be equated with terminal cell 
fate specification; the cells expressing patterned regulatory states are yet 
far upstream in the developmental process from their ultimate descen- 
dants, which will eventually differentiate in various directions, accord- 
ing to what part of the organ they arise in. 

The similarity between GRNs encoding adult body part formation and 
those controlling earlier embryogenesis is also sustained at the sub-circuit 
level in that few additional types of sub-circuit are used. For instance, in 
pre-gastrular embryonic specification it can be confidently predicted that 
a feedback circuit locking two or three regulatory genes in a mutual 
positive embrace will be encountered just downstream of the initial 
inputs used to set up a given regulatory state (Table 1). This is seen at 
multiple locations in the sea urchin embryonic GRNs’*”’, and the same 
feature routinely appears in adult body part GRNs: for example, in those 
underlying development of neural crest in lamprey”, gut specification in 
vertebrates”, eye lens field specification in both vertebrates*® and 
Drosophila’’, haemangioblast specification in mouse**, pharynx spe- 
cification in C. elegans’, heart specification in mammals and 
Drosophila®*°*!, and pancreatic [B-cell specification in mouse’. Each of 
these GRNs include two- or three-gene positive feedback sub-circuits 
functioning to lock down newly installed spatial regulatory states. The 
similar feedback sub-circuits are constructed with different genes; again it 
is the sub-circuit topology that determines function, and many different 
regulatory genes can have the same roles. An additional type of sub- 
circuit that is commonly seen in later embryonic processes is a signal- 
mediated, mutual repression device that operates across a cellular bound- 
ary, such that, on either side, reception of a signal from the other side 
specifically causes repression of key genes of the alternate regulatory state 
(Table 1). Four examples from later development where this type of 
circuitry obtains are in the GRN that maintains the distinct regulatory 
states of anterior and posterior parasegment compartments in 
Drosophila’, the GRN controlling establishment of dorsal/ventral neural 
tube domains in vertebrates*’, the GRN controlling anterior versus pos- 
terior specification in the vertebrate limb bud*’, and the GRN controlling 
cell type specification under signal control in the C. elegans vulva”. 


Postembryonic developmental GRNs: differentiation 
from pluripotent stem cells 


A remarkably recurrent similarity in GRN circuit design has recently 
emerged in studies of the transcriptional pathways that control binary 
fate choices executed in the diversification of haematopoietic cell types 
from multipotent precursors (for reviews, see refs 36-38). At the cores of 
these circuits, which use some overlapping and some lineage-specific 
regulatory genes, are pairs of genes encoding transcription factors that 
mutually antagonize each other’s expression within the same nucleus. 
Often initially co-expressed at relatively low levels, the lineage fate 
choice depends on stepped up asymmetric expression of one or the other 
of the core repressor gene pair. Each of these genes also directly or 
indirectly promotes expression of positive regulators necessary for exe- 
cution of one of the lineage fate choices. As the activity of one of the core 
repressors increases, it causes transcriptional extinction of expression of 
the alternative choice, and the irreversible installation of its own positive 
regulatory state (see discussion of the mathematical features of such 
circuits below). An important point is that the genes of the antagonistic 
repressor pairs, and/or the regulatory genes that are their immediate 
targets, also provide direct positive or negative inputs into terminal 
differentiation genes of the alternate lineages’. In other words, this 
apparatus is deployed immediately upstream of the drivers of the 
effector genes that generate the features of given cell types (Fig. 1c). In 
comparison to the embryonic GRNs just considered, these are relatively 
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shallow networks. Ultimately the decisive inputs into one or the other of 
the core repressors are provided by extrinsic signalling ligands, for 
example cytokines and growth factors, including Notch and Tgff, or 
endogenous immune receptor signals. The binary choice transcriptional 
apparatus responds to signal intensity, so that a low input gives one 
result and a high input another. Different pairs of repressor genes per- 
form similar roles in different lineage fate choices, but what is remark- 
able is the similar circuitry adduced throughout haematopoietic 
diversification. Transcriptional balance between pairs of cross-antagon- 
istic repressors decides the outcome, for instance, in myeloid progeni- 
tors giving rise to macrophages or neutrophils”; in precursors that may 
give rise to either B cells or macrophages*, where there is cis-regulatory 
evidence of the transcriptional cross-repression; in the upper level 
decision point where erythroid versus myeloid fates bifurcate’*; in 
the erythroid versus platelet fate decision’. Similarly, in T-cell diver- 
sification between helper vs killer fate”, T-cell receptor signal strength 
indirectly controls repressor function, a compelling case because there is 
direct cis-regulatory evidence of the reciprocal transcriptional silencing 
interactions*®. 

Although to some it is tempting to view all development through the 
same lens, there are fundamental differences between the terminal fate 
choice circuitry discussed here and the GRNs that execute early and 
mid-stage embryonic development of animal body parts. 
Differentiation gene batteries can be activated only at the end of the 
series of GRN transactions that decide exactly where they are to be 
deployed. Haematopoietic cell fate decisions occur at the end of a com- 
plex prior developmental process, and in fact as discussed below, the 
circuitry controlling very early haematopoietic stem cell pluripotential- 
ity operates in an entirely different manner from the binary choice 
circuitry just considered**’. In their function, haematopoietic binary 
choice sub-circuits are similar to the terminal sub-circuits that elsewhere 
in development immediately determine deployment of differentiation 
gene batteries. This perhaps explains why a characteristic of the stem cell 
differentiation choice systems, in other words the simultaneous low level 
expression in the multipotent precursors of differentiation genes indi- 
cative of multiple possible fates**“’ (‘lineage priming’), is not seen in 
embryonic fate choices. That is, in embryonic body part development 
the spatial fate decision is made far up in the GRN hierarchy, and locked 
down, long before the differentiation gene battery is deployed. In con- 
trast, in the production of functional immune cell types the last steps in 
the decision have to be deferred until the multipotential cells can be told 
which of its potentialities is more needed. Similar binary choice circuitry 
is also used in non-haematopoietic developmental contexts, but again at 
late stages in a given process where a terminal fate choice is to be made. 
For example, after mammalian somites have formed, they generate spa- 
tially confined subdomains, one of which is the dermomyotome. This 
consists of multipotent stem-like cells, where the choice to generate 
vascular muscle versus smooth muscle cell types is controlled by a 
modulated signal, mutual repression between the Pax3 and Foxc2 
genes”’. Another non-haematopoietic circuit that in essence is remark- 
ably similar to the antagonistic haematopoietic repressor pair sub-cir- 
cuits was discovered in C. elegans, also operating at the terminus of 
much prior development’. This circuit maintains the expression of 
distinct sets of differentiation genes expressed in left versus right taste 
neurons, but the duelling repressors expressed alternately in these two 
neurons are in this case microRNAs that directly target the mRNAs 
encoding the alternate differentiation drivers. All of these kinds of 
sub-circuits, operate to choose, and/or to maintain the choice, of one 
of an alternative pair of differentiation gene driver sets. 

A priori, development of the body plan cannot be reduced to differ- 
entiated cell type specification, the last step in the process, nor to binary 
decisions between alternative fates. This is at root because development 
of the body plan requires a long sequence of multidimensional spatial 
decisions: during pattern formation spatial regulatory states must be 
installed progressively within multiple (>2) diverse boundaries, and 
also in certain anterior-posterior and dorsal-ventral positions with 
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respect to the body plan. In each structure of the body regulatory states 
that include differentiation gene battery drivers are finally installed. 
Thus it is not in principle surprising that if the set of differentiation 
gene battery regulators is changed by experimental intervention, a dif- 
ferent cell type can be made to appear. Many recent studies show that 
insertion of vectors expressing sets of transcription factors or even single 
transcription factors can result in the change of differentiated state from 
one haematopoietic cell type to another*®; from fibroblast to neuron”, 
from exocrine to pancreatic B cell°’, etc. These cell fate changes all occur 
near the far downstream periphery of GRN hierarchy, as symbolized in 
Fig. 1. Growing a new cell type requires activation of a new differenti- 
ation battery, whereas growing a new body part requires a prior process 
of spatial pattern formation driven by a deep GRN. More generally, 
although there are embryonic processes that look superficially like the 
binary choices just discussed, they are effected very differently. As an 
example, in the sea urchin embryo, endomesodermal precursor cells 
give rise both to mesoderm and to endoderm, fates driven by entirely 
distinct regulatory states. But a careful experimental analysis® shows that 
there is no pluripotential “endomesodermal’ GRN, and instead a Delta/ 
Notch signal activates a set of regulatory genes which constitute a meso- 
derm GRN, while in the same cells a Wnt/Tcf signal activates a different 
set of regulatory genes which constitute the endoderm GRN. The genes 
of the mesoderm GRN and of the endoderm GRN are expressed inde- 
pendently of one another, without any interactions. The cells of each 
regulatory state are then separated physically by a cell division, so that 
the Notch signal is received exclusively by one ring of cells, which 
becomes mesoderm, while the other cells express the endoderm GRN 
exclusively*. Nor are the exclusion functions (Table 1) that in given 
regulatory states act to repress genes key to alternative regulatory states 
‘bipotential switches’. These sub-circuits are used to lock down regula- 
tory choices already installed rather than to make choices. They may 
look superficially like the mutual repression sub-circuits that switch 
lineages bipotentially, but they are not. 


Differentiation gene battery structure 


Differentiation gene batteries account for functional cell type specificity, 
and a canonical network structure can be associated with them. This 
structure describes the topology of the regulatory relationships causing 
the protein coding differentiation genes of the battery to be expressed 
more or less coordinately. Differentiation gene batteries are per se shal- 
low, relatively simply constructed types of sub-circuit, often wired in 
coherent feed forward format, as for example in sea urchin embryos’, 
pancreatic B-cells*, and macrophages”. As the immediately upstream 
GRNs are being uncovered, an additional characteristic of differenti- 
ation gene battery regulatory circuitry is often encountered: this is the 
occurrence of feedback between the drivers of the differentiation genes 
just upstream of the linkages to the effector genes, either auto- or cross- 
regulatory”*”*, though this is not always seen”. The canonical form is 
that of Fig. 2a. Differentiation gene batteries consist of a sometimes very 
large number of effector genes, the relevant cis-regulatory modules of 
which (per battery) respond to members of a small set of transcription 
factors present as part of the terminal regulatory state. However, each 
such cis-regulatory module may in addition be serviced by some addi- 
tional factors, which accounts for the fact that all the genes of the battery 
are not exactly expressed in lockstep’. For example, muscle protein 
genes are activated by two or three of the transcription factors ortholo- 
gous to Srf, Mef2, and a myogenic bHLH factor in vertebrates”, plus, 
individually, other factors; whereas in C. elegans the differentiation 
genes of each class of neuron are identified by their response to a single 
key transcription factor, sometimes together with other factors”. 

It is logically consistent that where there is direct repression of differ- 
entiation gene batteries by a proximal control circuit (‘anti-differentiation’) 
much the same architecture would be employed. In embryonic stem cells a 
hierarchical GRN that maintains the pluripotent state is headed by a 
recursive triple feedback system that links Nanog, Oct4 (also known as 
Pou5fl) and Sox2 genes**’. Apparently directly downstream of this are 
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Figure 2 | Structural characteristics of downstream effector gene cassettes 
and their control functions. a, Typical differentiation gene battery, as 
discussed elsewhere’. Here each effector gene codes for a cell-type-specific 
protein required to generate the cell-specific output. These effector genes are all 
transcribed specifically in the given cell type in response to a small number of 
regulatory factors, which are themselves the output of the controlling 
specification GRN. Every effector gene of the battery is specifically controlled 
by these inputs. The immediate drivers of the battery shown cross-regulate (as 
is often the case). b, Structure that may be typical of morphogenetic effector 
gene cassettes. Here the output of the specification GRN is used to control 
transcription of only a minor fraction of key effector genes, and these in some 
way trigger or nucleate the process. But many of the proteins required for the 
function are widely expressed. 


linkages to many genes encoding transcriptional activators and repres- 
sors”, including a polycomb repressor that in turn targets regulatory 
genes associated with various differentiation states’. But also among the 
immediate targets of the triple feedback loop is the Rest gene, which 
encodes a factor that directly represses neurogenic differentiation genes”. 
This circuit is the mirror image of gene battery activation circuits. 


Structure/function relations for GRNs controlling 
diverse kinds of biology 

The downstream effector gene cassettes required for development 
include those executing morphogenetic cell biology functions, as well 
as differentiation gene batteries. A distinction is that by definition, dif- 
ferentiation genes are expressed cell type-specifically, whereas genes 
required for functions such as motility, ingression, invagination, 
cell division, convergent extension, tube formation, branching, shape 
remodelling, epithelial-mesenchyme transition, etc., may be deployed in 
many diverse cell types and many diverse contexts in development. 

If we imagine a canonical differentiation gene battery to be structured 
as in Fig. 2a, how different will be the topology of a morphogenetic gene 
cassette? One possible clue comes from various studies on GRN linkages 
that execute transcriptional control of cell replication in developing sys- 
tems. The spatial patterns of cell replication of course affect morphology, 
because the size and shape of given portions of a structure depend on the 
number of rounds of cell division mediated by the regulatory state in each 
developing region. In several cases the exact outputs of a developmental 
GRN that specifically control cell cycle activity have been determined. For 
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example in developing pituitary, several linkages from the specification 
GRN directly control proliferation®: the Pitx1 gene provides inputs into 
the cyclin D1 gene; the Six gene acts to repress expression of a cell cycle 
arrest kinase; and Six1 plus other factors of the pituitary regulatory state 
activate c-myc (also known as Myc). In the developing zebrafish eye the 
GRN linkage to cell cycle control is regulation of cyclin D1 and c-myc (also 
known as myca/mycb) by the meis1 regulatory gene*’. Thus, so to speak, 
these GRNs deploy the complex process of cell division by pressing a 
small number of regulatory ‘buttons’. 

Perhaps only a subfraction of the effector genes in a morphogenetic 
gene cassette are transcriptionally regulated by direct inputs from the 
upstream GRN. This concept emerged from a study of the migration of 
heart precursor cells in developing Ciona®’, one of the few system-level 
investigations we have into the transcriptional control of a morphoge- 
netic function. A large number of cell biology genes participate in the 
processes of membrane protrusion and motility required for heart cell 
migration, but most of these genes are widely expressed. Migratory 
activity is specifically deployed by transcriptional activation of the 
rhoDF gene, which encodes a key required GTPase, and it is this gene 
which is directly controlled by the cis-regulatory outputs of the upstream 
GRN. The same principle is evident in a study of trichome formation in 
Drosophila®’. Here again, an extensive patterning GRN lies upstream, 
and determines the location of the morphological features and its cel- 
lular progenitors. The remodelling of epidermal cell shape to produce 
trichomes (or alternately, smooth cuticle) is controlled by expression of 
the regulatory gene shavenbaby (also known as ovo), and some of its 
direct effector gene targets are known. But these are again only a fraction 
of the total genes whose products are required to build the trichome. If 
these examples are a guide, the wiring of differentiation gene batteries, in 
which every downstream gene is a specific target of the GRN (Fig. 2a), is 
distinct from the way morphogenetic gene cassettes may be wired 
(Fig. 2b). Many of the genes contributing to a morphogenetic cell bio- 
logy process may be widely expressed and only a few key ‘button’ genes 
that functionally nucleate the whole process are transcriptionally con- 
trolled by GRN outputs, to deploy the process spatially. Were this a 
general result, it would promise the existence of simple regulatory levers 
by which morphogenetic cassettes could be re-deployed, either in evolu- 
tion or in re-engineering projects, to which we return below. 

A uniquely explanatory GRN analysis of innate immunity response 
mechanisms in dendritic cells, following stimulation of Toll-like recep- 
tors (TLRs), shows how a classic physiological response is pro- 
grammed at the genomic level. Stimulation of TLRs 2, 3 and 4 with 
various agonists activates two partly overlapping response programs 
of effector gene expression, in other words an antiviral program and 
an inflammatory program. This study included all regulatory genes 
specifically involved in the process, and the architecture of the GRN 
was based on a comprehensive, quantitative perturbation analysis, using 
small hairpin RNAs (shRNAs ) to block regulatory gene transcription, 
although no direct cis-regulatory validation of the GRN structure was 
reported. Several interesting differences and also similarities emerge in 
comparing the structure of this physiological response GRN to that of 
the developmental GRNs considered above. A salient similarity is in the 
structure of the effector gene sets. Like many differentiation genes, the 
TLR response effector gene sets are largely wired to their drivers in 
coherent feed forward loops. Another now familiar feature is the use 
of positive feedback that will lock down the regulatory state following a 
transient input, here between stat genes high up in the antiviral response 
GRN hierarchy. This GRN is of moderate depth: downstream of the stat 
genes are three other regulatory genes linked to the stat genes and to one 
another by cross-regulatory interactions, and downstream of these in 
turn are further regulatory genes, and then the effector genes. A further 
device in these GRNs that also is often used in development, is exclusion 
of the alternate regulatory state by specific cross-repression, once one of 
the pathways is active. The depth of the inflammatory hierarchy is only 
that of the feed forward circuitry. Physiological systems are homeostatic, 
and a special feature of this one is a self-cancelling repression circuit the 
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sequence-specific basis of which is, however, yet unknown. Some years 
ago a prescient analysis predicted that in general, developmental GRNs 
which control progressive irreversible regulatory processes would have 
considerably greater depth than does reversible physiological response 
circuitry®, and this turns out to be exactly true. 

One way of summarizing the result of a comparative meta-analysis of 
GRNs controlling diverse kinds of biological processes is to consider 
their similarities and differences in the same terms: they are similar in 
that all the GRNs considered here are modular constructs of a basic 
repertoire of sub-circuit topologies (Table 1); but they differ in their 
global hierarchical organization, which reasonably reflects the biological 
jobs they execute. 


Insights into process from mathematical models of GRNs 
and sub-circuits 


Space confines the following discussion to recently conceived models 
based ab initio on experimentally generated, largely validated network 
topologies. The major focus is on how, or whether, mathematical ana- 
lyses of the models has succeeded in enriching our understanding of the 
biological functionalities of the observed circuitry. 

Beginning with a known network topology, the common objective is 
to generate a dynamic mathematical model, either using continuous 
(ordinary differential equations or ODE) or Boolean approaches”. For 
large scale temporal models of embryonic spatial specification systems 
involving many genes and interactions, this often involves a great num- 
ber of unmeasured parameters, and epistemological issues immediately 
arise. In many such works arbitrary parameter values are systematically 
explored until the expected results emerge, but this is inherently at least a 
partially circular logic, since it assumes a priori that the model is right. 
Of course where there are applicable experimental measurements of the 
output kinetics, the model is better constrained, but then the novelty of 
the biological insights that can be expected is limited because both the 
input relationships and the results are assumed. Drosophila gap gene 
expression in the syncytial embryo provides the best known large devel- 
opmental data set thus far subjected to mathematical kinetic analysis”. 
Extensive genetic and cis-regulatory data partially specify the embryonic 
interaction networks of these genes’*’°. Mathematical models were built 
assuming the network topologies proposed in prior work”*”, and fit to a 
very high quality set of quantitative kinetic measurements which capture 
the empirical dynamics of changing gap expression patterns in the pre- 
cellularization 13th-14th cleavage cycle’’”’. There were two outcomes 
relevant to the structure/function relationships of this developmental 
GRN: First, a dynamic image of how the gap gene transcription factors 
operate emerged, illuminating what might be called the cell biology of 
the process (were there cells). Second, the analysis suggested several 
additions and corrections of unresolved details of gap gene interactions. 
But largely the outcome was just that if one does the math and the 
measurements, everything turns out to make sense. 

An important area of developmental biology in which modelling has 
contributed novel mechanistic understanding is transcriptional res- 
ponse to signals. We cannot here deal with the many studies focused 
on dynamic spatial distributions of signal ligands per se. But mathemat- 
ical analyses of models capturing transcriptional network circuitry 
downstream of intercellular signalling have illuminated developmental 
signal response in multiple ways. These models concern smaller and well 
constrained sub-circuits, rather than whole GRNs, and often either 
parameters can be reasonably approximated, or dimensionless 
approaches can be found. The signal-driven transcriptional patterning 
process by which the two dorsal respiratory appendages on the roof of 
the Drosophila egg are positioned affords an example”’. An experiment- 
ally based network circuitry animated by spatially confined epidermal 
growth factor (EGF) and Dpp signalling was used to produce a dynamic 
mathematical model which satisfactorily interprets the changing pattern 
of expression of a key gene of the pro-appendage regulatory state in 
dorso-anterior follicle cells. The model thus explains how this system 
generates and positions the bilateral spots of gene expression where the 
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appendages will form, which is not otherwise transparent. Furthermore, 
in consequence of a conflict between prediction and experimental obser- 
vations, the analysis required a hitherto unsuspected positive feedback 
loop by which Dpp controls expression of its own receptor. A second 
example concerns transcriptional interpretation of graded hedgehog 
(Hh) signals in the developing neural tube, which results in a ventral 
to dorsal series of spatial regulatory state domains each of which gives 
rise to certain neuronal types**. When experimental measurements of 
signal intensity over time in the various transcriptional domains were 
analysed mathematically’””*, it emerged that the successive ventral to 
dorsal transcriptional domains are defined by the integrals over dura- 
tion and intensity of Hh signalling, rather than simply on ‘morphogen 
concentration’, as always assumed previously. A third example”*° 
relates to the Wnt signalling required in Xenopus embryos to activate 
key regulatory genes of the dorsal organizer. Experimental perturbations 
of this canonical developmental signalling system showed that this sys- 
tem responds to the ratio of the (signal) input at some given time, to its 
level when the signalling began (‘fold change’), and not to absolute signal 
level (the same phenomenon is often seen in other contexts, for example, 
sensory physiology). A predicted explanation in terms of network sub- 
circuit topology was then derived from a dynamic mathematical analysis 
of the incoherent feed forward sub-circuit®®, which showed that this 
commonplace sub-circuit possesses the capacity to respond to fold 
change in input magnitude, rather than to absolute input magnitude. 
As noted above, another general area in which modelling has illumi- 
nated process in respect to given sub-circuit topology is in binary cell 
fate choice, following a precursor phase in which both regulatory states 
are weakly expressed. Here the repeatedly observed sub-circuit structure 
features the opposition of two antagonistic repressors, each, if highly 
expressed, capable of shutting off the alternative regulatory state and 
generating its own, and each animated by inputs that reflect the external 
need for its pathway. A canonical approach to dynamic mathematical 
modelling of this type of sub-circuit has been repeatedly applied, based 
essentially on treating transcriptional activation abstractly as a catalytic 
Michaelis-Menton process, and repression in the same vein (for 
example, refs. 38, 81). The object is to demonstrate that these ‘duelling 
repressor’ sub-circuit topologies indeed encode regulatory systems that 
are capable of hysteretically moving from the precursor state to one or 
the other terminal regulatory states, depending on the inputs the system 
receives. But a problem with this approach is that as conventionally 
formulated, the bi-stable mathematical behaviour requires the comple- 
tely ad hoc assumption of large exponential (Hill) coefficients in the 
repression functions (that is, coefficients >2, and often much larger 
values have to be assumed in order to generate the expected behaviour). 
Although Hill coefficients of these magnitudes physically imply coop- 
erativity, or additional (unknown) reactions, they are customarily 
inserted in the computations despite lack of any direct biological evid- 
ence for cooperativity or other physical features that would justify them. 
Indeed, in one recent study of another very similarly wired haemato- 
poietic choice system™, the erythroid/myeloid fate choice, it was pointed 
out that the specific, mutually repressive cis-regulatory interactions 
which were obtained are known not to be multimeric and cooperative, 
nor is there any other biochemical justification for high Hill coefficients. 
Instead an alternative regulatory architecture was considered, the 
dynamic mathematical analysis of which resulted in the prediction that 
the sub-circuit should include in addition to the antagonistic repressors 
another gene or genes operating according to specified network lin- 
kages. The latter work*’, furthermore, used a now classic probabilistic 
thermodynamic treatment of cis-regulatory transcription factor bind- 
ing* that is directly based on transcription factor-DNA interaction 
physical chemistry. This same thermodynamic approach to modelling 
cis-regulatory transcription factor binding has been used for analysis of 
an entirely different type of sub-circuit operating at the initial devel- 
opmental appearance of pluripotential haematopoietic stem cells**“”. 
This sub-circuit consists of three positively active genes. There are no 
cross-regulating repressors in this sub-circuit, and the three genes are 
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linked by multiple positive auto- and cross-regulatory linkages. In life 
and in the model, extrinsic signals switch it irreversibly into an active 
state; otherwise, if one node is inhibited, it remains off’’. Thus there are 
multiple different designs that confer signal-dependent bi-stability. 

The thermodynamic binding approach*’ was also used earlier for 
dynamic modelling of sea urchin embryo gene cascades™. The import- 
ant insight emerged that in a cascade where a given gene activates a 
second downstream gene, significant expression of the second gene 
occurs long before the product of the first gene reaches steady state, 
and the whole dynamic system operates in a ‘forward drive mode’ rela- 
tively insensitive to levels of upstream activators. The kinetics of such 
embryonic regulatory interactions are not narrowly determinate, as 
emphasized by the kinetic ‘sloppiness’ of a process which operates suc- 
cessfully at different rates at different temperatures within and between 
similar species, and in which there is a significant range in the concen- 
trations of many transcription factors embryo to embryo”. 

For other situations in embryonic development where the object is to 
encompass a complex, large scale spatial specification system rather 
than to follow a given small domain or cell type through time, conven- 
tional, stand-alone dynamical analysis is the wrong tool for the job that 
really needs to be done. Returning to Table 1, for example, we see that 
there are several kinds of spatial specification sub-circuit, that in cellular 
early embryos produce novel spatial regulatory state patterns, for 
example X,1 — X spatial processing sub-circuits and AND spatial logic 
processors. These, and indeed many other embryonic specification pro- 
cesses that define multicellular territorial regulatory states, result in a 
progressive Boolean-like pattern of diverse regulatory states confronting 
one another sharply across territorial cellular boundaries. A model that 
would capture what the GRN really does must address this kind of 
outcome, capturing the encoded input information-processing beha- 
viour at each cis-regulatory module of the GRN. 

Current developmental GRNs mainly concern, on the one hand, far 
upstream hierarchical transactions that essentially execute regulatory 
state pattern formation, or on the other, far downstream differentiation 
gene batteries and their immediate governance. These will have to be 
much better linked, so that we have a continuous understanding of the 
control systems from the top of the hierarchy to all the effector genes of a 
developing system. This kind of global GRN will be much larger than 
anything we have at present. Other kinds of global GRNs are on the 
horizon as well, such as those that encompass all parts of a developing 
embryo through time. Experimentally validated GRNs that include 
complete large regulatory systems will present enormous computational 
challenges for modelling, presentation, logic analysis and modular 
abstraction. 


Developmental GRNs and evolutionary mechanism 


Because development of the body plan is caused by the operation of 
GRNs, evolutionary change in the body plan is change in GRN structure 
occurring over deep time****. Evolution and development emerge as 
twin outputs of the same mechanistic domain of regulatory system 
genomics. It is therefore to be expected that, at the level of GRN struc- 
ture, each would illuminate the other, and so indeed they already do in 
several concrete ways. To start, it is obvious that if there is indeed a finite 
repertoire of network sub-circuits used to effect development, the evolu- 
tion of development has to be considered as the process of assembly, 
reassembly, and redeployment of these sub-circuits. This general idea 
will become directly testable by widespread evolutionary comparisons, 
as the GRNs underlying the development of diverse animal forms 
become available. Structural comparison of GRNs between forms of 
known phylogenetic relation in turn reveals the modularity of GRN 
structure, by revealing sub-circuit boundaries, as when a sub-circuit is 
inserted wholesale into a new GRN context*’. Furthermore, the sub- 
circuits of which GRNs are composed change during evolution at dif- 
ferent rates, highlighting the linkages belonging to the most conserved 
sub-circuit ina GRN comparison. As discussed elsewhere, in general the 


918 | NATURE | VOL 468 | 16 DECEMBER 2010 


oldest GRN features are certain differentiation gene batteries***, which 
are eumetazoan (cnidarian + bilaterian) in distribution. In contrast, the 
morphogenetic programs that pattern each form of body plan are by 
definition clade-specific**. Certain remarkably conserved regulatory 
sub-circuits that are located near the top of developmental GRN hier- 
archy may serve to lock down developmental process specific to given 
phyla or classes (GRN kernels)***’. Thus GRNs are historically as well as 
structurally and functionally modular, in that they are a mosaic of sub- 
circuits of diverse antiquity and phylogenetic distribution. Systematic 
exploration of phylogenetically related GRNs at different distances is 
valuable not only to discover the evolutionary origins of each sub-cir- 
cuit, but also to reveal which kinds of sub-circuits and linkages are 
inherently flexible and which not. This brings us to the most important 
point for the future. In order to probe control of spatial regulatory state, 
laboratory strategies will need to be designed for changing GRNs by 
insertion of network regulatory apparatus into developing systems. But 
this is the same kind of change that happened in evolution, and the 
results will be mutually informative. Thus a practical convergence is 
on the horizon. Re-engineering spatial developmental processes, and 
recreating evolutionary processes, while different in motivation, will 
both depend on fundamental understanding and experimental manip- 
ulation of the structure/function relations of developmental GRNs. 

The processes we have been discussing, development and evolution of 
the body plan, and execution of physiological responses, devolve caus- 
ally from the regulatory genome. We need to understand GRNs because 
they encompass the primary output of the regulatory genome, itself the 
fundamental and unique outcome of more than 600 million years of 
animal evolution®. 
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Light-avoidance-mediating 
photoreceptors tile the Drosophila 


larval body wall 


Yang Xiang’, Quan Yuan!, Nina Vogt?, Loren L. Looger®, Lily Yeh Jan! & Yuh Nung Jan! 


Photoreceptors for visual perception, phototaxis or light avoidance are typically clustered in eyes or related structures 
such as the Bolwig organ of Drosophila larvae. Unexpectedly, we found that the class IV dendritic arborization neurons of 
Drosophila melanogaster larvae respond to ultraviolet, violet and blue light, and are major mediators of light avoidance, 
particularly at high intensities. These class IV dendritic arborization neurons, which are present in every body segment, 
have dendrites tiling the larval body wall nearly completely without redundancy. Dendritic illumination activates class 
IV dendritic arborization neurons. These novel photoreceptors use phototransduction machinery distinct from other 
photoreceptors in Drosophila and enable larvae to sense light exposure over their entire bodies and move out of danger. 


Light sensing is critical for animal life. Whereas image-forming visual 
perception allows animals to identify and track mates, predators and 
prey, non-image-forming functions regulate pupil reflex, phototaxis 
and circadian entrainment’’. In addition to eyes'’, extra-ocular 
photoreceptors exist’ °. For example, many eyeless or blinded animals 
can sense illumination of their body surfaces’. Birds possess deep- 
brain photoreceptors in their hypothalamus’, and extra-ocular photo- 
receptors are required for magnetic orientation of amphibians’. Recent 
studies demonstrate that eyeless animals such as Caenorhabditis ele- 
gans nonetheless have photoreceptors controlling light avoidance**. 

Drosophila larvae spend most of the time feeding by digging into 
food. Light avoidance is a crucial behaviour to minimize body expo- 
sure. When tested in groups in a dark/light choice assay, Drosophila 
larvae prefer darkness'"'”. This behaviour requires the pair of Bolwig 
organs on the larval head”; that is, primitive eye structures each 
comprised of 12 photoreceptors expressing Rh5 or Rh6, rhodopsins 
sensing blue and green light, respectively’’. 


Cells besides Bolwig organs contribute to photoavoidance 
We designed a photoavoidance assay for a single larva with sunlight- 
level intensities (-1 mW mm ~* in San Francisco on a clear day in 
June, consistent with previous reports’). Wild-type Drosophila larvae 
showed avoidance of a white light spot of 0.57 mW mm ~* (Fig. la and 
Supplementary Movie 1). Surprisingly, similar avoidance (Fig. 1b and 
Supplementary Movie 2) was exhibited by larvae with their Bolwig 
organs ablated by the pro-apoptotic gene Head involution defective 
(Hid; also called Wrinkled (W))'° expressed via the Bolwig-organ- 
specific promoter Glass Multimer Reporter (GMR)'*° (Supplementary 
Fig. 1a). Lower light intensities elicited less photoavoidance of wild- 
type animals, and even less of Bolwig-organ-ablated animals (Fig. 1c). 
However, at light intensities of 0.57 mW mm ~” or higher, GMR-Hid 
larvae showed avoidance comparable to wild-type animals (P > 0.05). 
Thus, although the Bolwig organs are responsible for dim light avoid- 
ance and dark congregation", Drosophila larvae must contain extra- 
ocular photoreceptors. 

Testing the wavelength dependence of photoavoidance using band- 
pass filters letting through ultraviolet (—360nm; Fig. 1d), violet 


(—402 nm; Fig. le), blue (—470 nm; Fig. 1f), green (—525 nm; 
Fig. 1g) or red light (—620 nm; Fig. 1h), we found that wild-type animals 
showed increased photoavoidance with higher light intensity (Fig. 1d- 
h), and were most sensitive to blue, violet and ultraviolet, and largely 
unresponsive to green and red light. Bolwig-organ-ablated animals 
showed less photoavoidance at low light intensity, but exhibited nearly 
normal avoidance response to high-intensity, short-wavelength light 
(Fig. 1d—f), demonstrating the existence of light-sensitive cells in addi- 
tion to Bolwig organs. Because there was no detectable temperature 
increase associated with 0.11 mW mm * violet light (Supplementary 
Fig. 2)—-which triggered avoidance in nearly 80% of the wild-type and 
GMR-Hid animals—and animals showed little response to high- 
intensity green or red light but strongly avoided low-intensity short- 
wavelength light (Fig. 1 d-h), light avoidance probably involves 
wavelength-dependent photoreceptors but not local heating. 


Class IV neurons tiling larval body wall sense light 

Given the report of diffusely distributed dermal photoreceptors trig- 
gering shadow reaction*°, we tested whether sensory neurons in the 
larval body wall could be candidate photoreceptors. Using GCaMP3, a 
genetically encoded calcium indicator'”"’’, we found that blue light 
delivered for 5 s to the dorsal cluster (Fig. 2a, see Fig. 3c for whole larva 
image) generated a marked fluorescence increase specifically in the 
soma, axon and dendrites of ddaC, a class IV dendritic arborization 
neuron (Fig. 2b, e, f), but not in nearby sensory neurons (Fig. 2b, e, f). 
ddaC also responded to ultraviolet light (which also caused photo- 
bleaching), but not to green light (Fig. 2c-f). There were also Ca** 
increases specifically in class IV dendritic arborization neurons of the 
ventral and lateral cluster (V’ada and VdaB, respectively) in response 
to ultraviolet and blue light, but not green light (Supplementary Figs 3 
and 4). Similar GCaMP3 fluorescence responses were seen in class IV 
dendritic arborization neurons in body segments from head to tail 
(Supplementary Fig. 5). 

Extracellular recording further revealed a progressive increase in 
action potential frequency when the dorsal class IV dendritic arbori- 
zation neuron ddaC was illuminated with increasing intensity of blue 
light (Fig. 2g), ultraviolet light and violet light, but not red light 


1Howard Hughes Medical Institute, Departments of Physiology, Biochemistry, and Biophysics, University of California San Francisco, San Francisco, California 94158, USA. *Center for Developmental 
Genetics, New York University, New York, New York 10003, USA. 3Howard Hughes Medical Institute, Janelia Farm Research Campus, Ashburn, Virginia 20147, USA. 
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contribute to photoavoidance. 
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(Supplementary Fig. 6). Responses were: 340 nm > 380 nm > 402 
nm > 470 nm > 525 nm or 620 nm light (Fig. 2h). 

The wavelength dependence of ddaC firing rate increase was similar 
to that observed with GCaMP3 imaging and the light avoidance beha- 
vioural assay. The latency between the onset of light stimulation and 
action potential burst firing decreased with higher light intensity, and 
was as short as 1s with bright illumination (Supplementary Fig. 6). 
When illuminated with 1.4 mW mm ” of white light (approximating 
sunlight), ddaC neurons in the dorsal cluster showed a significant 
firing increase (Fig. 2i). Similar robust activation of ventral (VdaB) 
and lateral (V’ada) class IV dendritic arborization neurons was 
induced by 52.8mW mm ” blue light (Supplementary Fig. 7a, b). 
The response of class IV dendritic arborization neurons was similar 
regardless of their location along the body axis (data not shown), as in 
the case of GCaMP3 imaging (Supplementary Fig. 5). 

We did not observe any significant effects of light on firing rate of 
class I or III dendritic arborization neurons (Supplementary Fig. 7c, d, 
P > 0.05). Because class I dendritic arborization neurons progressively 
increased their firing rate as the temperature was raised above 30 °C, 
whereas class IV dendritic arborization neurons showed an abrupt 
increase of firing rate only above 40 °C (Supplementary Fig. 8), thermal 
responses cannot account for the light-induced increase of firing in 
class IV but not class I dendritic arborization neurons. Moreover, 
application of 10 1M H20:, which elevates the reactive oxygen species 
(ROS) level in Drosophila larvae, had no effect on the firing rate of 
class IV dendritic arborization neurons (Supplementary Fig. 9). These 
studies demonstrate that ultraviolet, violet and blue light activate class 
IV dendritic arborization neurons in an intensity-dependent manner. 
Responses occur at sunlight-level intensities, are not induced by heat or 
ROS, correlate with behaviour, and are confined to this specific class of 
sensory neurons throughout the animal. 


Light activates class IV neurons and dendrites in isolation 


The dendritic arborization neurons have dendrites in contact with 
epithelial cells whereas their somas and axons are wrapped by glia”’. 
To test whether class IV dendritic arborization neurons can sense light 
by themselves, we prepared primary neuronal cultures””? from 
embryos expressing GCaMP3 and RFP specifically in class IV dendritic 
arborization neurons by means of pickpocket-GAL4 (ppk-GAL4)”. 
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Ultraviolet and blue light illumination of isolated class IV dendritic 
arborization neurons generated a robust increase of GCaMP3 signals 
(Fig. 3a and Supplementary Fig. 10). In contrast, cultured class III 
dendritic arborization neurons expressing GCaMP3 and RFP via 
19-12-GAL4 yielded no light response (Fig. 3b). Thus, class IV den- 
dritic arborization neurons have the intrinsic ability to detect light. 

Dendrites of class IV dendritic arborization neurons tile the larval 
body wall with non-overlapping but complete coverage of the dend- 
ritic field**’° (Fig. 3c). Illumination of only the dendrites of class IV 
dendritic arborization neurons (Fig. 3d) with ultraviolet, violet and 
blue light, but not green or red light, activated the neurons (Fig. 3e). 
The activation spectrum is similar to that for illumination of the entire 
class IV dendritic arborization neurons (Fig. 2h), indicating the pres- 
ence of phototransduction machinery in the dendrites. 


Gr28b is critical for light transduction in class IV neurons 


No defects in light response of class IV dendritic arborization neurons 
were found in available mutants of rhodopsins’*”’ and cryptochrome 
(cry)’*, as well as a mutant in no receptor potential A (norpA), which 
encodes phospholipase C (PLC), downstream of rhodopsins” 
(Fig. 4a). We then tested the Drosophila homologue of Lite-1, a 
C. elegans light sensor*'°. The closest homologue of lite-1 in 
Drosophila is gustatory receptor 28b (Gr28b), annotated as encoding 
a gustatory G-protein-coupled receptor. Several Gr28b-GAL4 lines 
carrying different promoter regions revealed consistent expression 
in all class IV dendritic arborization neurons, two sensory neurons 
in the lateral body wall, plus several neurons in the ventral nerve cord 
(Supplementary Fig. 11), as reported previously”. To test for the 
functional role of Gr28b in the light-induced electrophysiological 
responses, we recorded from class IV dendritic arborization neurons in 
Dmel\Mi{ET1}Gr28b?°"*8 (MiET1) and Dmel\PBac{PB}Gr28b°!8** 
(PBac) larvae with P-element insertion into the Gr28b coding and intro- 
nic regions, respectively (http://flybase.org/reports/FBgn0045495.html). 
Whereas these P-element insertions did not alter the basal firing rate 
(data not shown), they caused a significant reduction in light-induced 
responses of class IV dendritic arborization neurons (Fig. 4b), as in hemi- 
zygous larvae carrying one MiET] allele and one deletion encompassing 
Gr28b (Df(2L)Exel7031; http://flybase.org/reports/FBab00379 10.html) 
(Fig. 4c). The MiET1 P-element inserts into the coding sequence 
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Figure 2 | Light activates class IV dendritic 
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difference (middle panel minus left panel), with 
ddaC dendrites (arrow) and axon (arrowhead) 
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common to all reported transcripts, and its mobilization for excision 
restored the light-induced response in class IV dendritic arborization 
neurons (Fig. 4d). Moreover, knockdown of Gr28b expression with 
UAS-RNAi driven by ppk-GAL4 caused an overall reduction of light 
response of class IV dendritic arborization neurons (Fig. 4e). Taken 
together, our data indicate that Gr28b is expressed in class IV dendritic 
arborization neurons, and is required for proper light responses. 
Whether Gr28b is the direct photosensing molecule awaits further 
experimentation. 

Sequence analysis revealed that Gr28b has a rhodopsin-like structure 
plus one extra transmembrane segment (Supplementary Fig. 12), rais- 
ing the question of whether the Gr28b-dependent light response 
involves G-protein signalling. To test whether G-protein signalling is 
required in class IV dendritic arborization neurons, we applied the 
myristoylated Py-binding peptide mSIRK, and found that the light 
response in class IV dendritic arborization neurons was significantly 
reduced (Supplementary Fig. 13). Thus, G-protein signalling is probably 
involved in the light response of class IV dendritic arborization neurons, 
similar to findings in C. elegans’. We further tested cyclic nucleotide- 
gated (CNG) channels, which are known to act downstream of Lite-1 
and G proteins in C. elegans’. Unlike in C. elegans, blocking CNG 


channels with L-cis-diltiazem in class IV dendritic arborization neurons 
had no effect on their light responses (Supplementary Fig. 14). 


TrpA1 is required in light transduction in class IV neurons 


Transient receptor potential (TRP) channels were first identified and 
characterized in the Drosophila compound eye”*’, with TRP and 
TRP-like (trpl) having key roles in phototransduction”. However, 
our electrophysiological studies revealed no defects in the light res- 
ponse of class IV dendritic arborization neurons in trp! or painless 
mutant larvae (Supplementary Fig. 15). 

TrpAl, a Drosophila homologue of mammalian TrpA, may func- 
tion as a thermosensor in larvae and adults***, and a receptor for 
reactive electrophiles such as allyl isothiocyanate (AITC)**. A TrpA1 
mutant exhibited normal basal firing in class IV dendritic arborization 
neurons (data not shown), but no light-induced firing increase 
(Fig. 4f). As reported previously”, we detected strong TrpA1 immuno- 
reactivity in several neurons in the larval brain but not in peripheral 
neurons (data not shown). We then performed MARCM (mosaic 
analysis with a repressible cell marker)**, and found no light response 
in class IV dendritic arborization neurons lacking TrpA1 (Fig. 4g), and 
a significant reduction of light-induced firing of the heterozygous class 
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Figure 3 | Cell-autonomous activation of class IV dendritic arborization 
neurons by light. a, b, Quantification of somatic fluorescence changes (AF/F) 
in response to 5s light and 100 1M allyl isothiocyanate (AITC) stimulation of 
cultured class IV (a) and III (b) dendritic arborization neurons; RFP signals 
serve as control. m = 10-13 (light) and n = 4 (AITC) in a, n = 9 in b. ¢, Larva 
with class IV dendritic arborization neurons labelled with GFP by ppk-GAL4. 
Dendrites tile the body wall. Boxed area shows an abdominal hemi-segment; 
three dotted circles mark soma positions of D (dorsal, ddaC), L (lateral, V’ada) 
and V (ventral, VdaB) class IV dendritic arborization neurons, respectively. Up, 
dorsal; left, anterior. Scale bar, 200 tm. d, Illumination of dendrites within the 
dotted circle of GFP-labelled ddaC dendrites. Up, dorsal. Scale bar, 50 tum. 

e, Responses of ddaC with dendritic illumination. n = 5. *P < 0.05, **P < 0.01, 
***P < 0.001; two-tailed paired t-test. All error bars indicate s.e.m. 


IV dendritic arborization neurons (Supplementary Fig. 16), indicat- 
ing that TrpAl is present in levels below immunodetection, but 
nonetheless of functional importance. In support of this notion, 
AITC caused strong activation of class IV dendritic arborization neu- 
rons, and this activation was abolished in the TrpAl mutant 
(Supplementary Fig. 17a). Moreover, TrpA1 RNAi expression specif- 
ically in class IV dendritic arborization neurons eliminated the light- 
induced firing change (Supplementary Fig. 18). Taken together, our 
observations suggest that TrpA1 is required cell-autonomously for 
light transduction in class IV dendritic arborization neurons. 

Given the lack of AITC activation of class I or class III dendritic 
arborization neurons (Supplementary Fig. 17b), we expressed TrpA1 
in class I dendritic arborization neurons and found that it conferred 
AITC sensitivity but not light response (Supplementary Fig. 17c, d), 
indicating that TrpA1 is not sufficient for light sensing. Because trans- 
heterozygotes carrying one mutant allele of TrpA1 and one copy of the 
MiETI P-element insertion in the Gr28b gene showed reduced light 
response (Supplementary Fig. 19), it is likely that Gr28b and TrpA1 
function in the same phototransduction pathway. 


Class IV neurons mediate light avoidance behaviour 

To test whether class IV dendritic arborization neurons are involved 
in light avoidance, we genetically ablated class IV dendritic arboriza- 
tion neurons of third instar larvae by expressing the pro-apoptotic 
genes Hid (ref. 15) and reaper (rpr) (ref. 37) via ppk-GAL4 (ppk-GAL4; 
UAS-Hid,rpr) (Supplementary Fig. 1). We also constructed a line 
lacking Bolwig organs as well as class IV dendritic arborization 
neurons (UAS-Hid,rpr; GMR-Hid; ppk-GAL4). Notably, both lines 
showed markedly decreased white-light-avoidance behaviour com- 
pared to wild-type and GMR-Hid (Bolwig-organ-ablated) larvae 
(Fig. 5a, b). Class IV dendritic-arborization-neurons-ablated animals 
showed a significant decrease of avoidance versus wild type, for all 
white light intensities tested (Fig. 5c-g). Ablation of class IV dendritic 
arborization neurons in animals lacking Bolwig organs produced a 
further decrease in white light avoidance (Fig. 5d-g). Avoidance of 
high-intensity (>0.57mWmm 7) white light was normal when 
Bolwig organs were ablated in wild type (Fig. 5e-g), and in control 
strains with either GAL4 or UAS (Fig. 5f). Taken together with similar 
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Figure 4 | Gr28b and TrpA1 are essential for class IV dendritic arborization 
neuron light responses. a, No significant defects were detected between wild- 
type and mutants of known phototransduction molecules with 340, 380, 402, 
470, or 620 nm light. n = 5-10. b, Reduced light response of class IV dendritic 
arborization neurons in MiET1 and PBac larvae. n = 8-29. c, Reduced light 
response of class IV dendritic arborization neurons in MiET1/deficiency larvae. 
n= 5-12. d, Precise excision of MiET1 P-element insertion restores light 
response in class IV dendritic arborization neurons. n = 6-9. e, Reduced light 
responses of class IV dendritic arborization neurons with Gr28b RNAi 
knockdown. n = 5-8. f, Abolished light responses of class IV dendritic 
arborization neurons in TrpAl ‘~ mutants. n = 8-13. g, MARCM analysis of 
TrpAl*'~ and TrpAl ‘~ class IV dendritic arborization neurons’ response to 
light. n = 5-8. For a-g, Light intensities (mW mm ”) are: 1.15 (340 nm), 5.79 
(380 nm), 11.4 (402 nm), 52.8 (470 nm), 43.4 (525 nm), 29.6 (620 nm) and 94.7 
(white). For a, b, ¢, e, *P < 0.05, **P < 0.01, ***P < 0.001; one-way ANOVA 
followed by a Bonferroni post test; for d, f, g, *P << 0.05, **P<0.01, 

***P < 0.001; two-tailed unpaired t-test. All error bars indicate s.e.m. 


findings with ultraviolet, violet and blue light (Supplementary Figs 
20-22), these results demonstrate that class IV dendritic arborization 
neurons are necessary to elicit photoavoidance at high intensities. It thus 
seems that the Bolwig organs and class IV dendritic arborization 
neurons operate in different light intensity regimes: Bolwig organs are 
tuned to low light, whereas class IV dendritic arborization neurons, 
required in low light, are the primary sensors at high intensities. 

Careful examination of ppk-GAL4 revealed additional expression 
in four mouth hook neurons, but not in the central nervous system 
(Supplementary Fig. 23a—d). Laser ablation of these four neurons in 
the GMR-Hid background had no effect on light avoidance behaviour 
(Supplementary Fig. 23e). Therefore, the class IV dendritic arboriza- 
tion neurons in the body wall are the ones important for the light 
avoidance behaviour. 

Pickpocket, a Degenerin/Epithelial sodium Channel (DEG/ENaC) 
family member specifically expressed in class IV dendritic arborization 
neurons~*** (Fig. 3c), has been implicated in locomotion control?" 
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Figure 5 | Class IV dendritic arborization 
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However, nose-touch experiments” revealed that larvae lacking class 
IV dendritic arborization neurons responded normally to gentle touch 
by retracting or turning away their heads (Supplementary Fig. 24). 
Moreover, direct recording of class IV dendritic arborization neurons 
in ppk mutant larvae revealed no defect in light response (Supplemen- 
tary Fig. 15b). These results demonstrate that reduced light avoidance 
in class IV dendritic-arborization-ablated larvae is not due to non- 
specific effects. 

To probe sufficiency, we expressed channelrhodopsin-2 (ChR2), a 
retinal-dependent cation channel gated by light from ultraviolet to 
green****, specifically in class IV dendritic arborization neurons. 
ChR2 conferred green light sensitivity to dendritic arborization neu- 
rons from larvae fed with retinal (Supplementary Fig. 25), as well as 
robust avoidance of green light of retinal-fed larvae without Bolwig 
organs (Fig. 5h). Thus, activation of class IV dendritic arborization 
neurons is sufficient to induce avoidance. 

With or without Bolwig organs, TrpA1 mutant larvae showed defi- 
cient avoidance of 1 mW mm ~° white light (Fig. 5i). Moreover, redu- 
cing TrpA1 expression in class IV dendritic arborization neurons by 
RNAi was sufficient to abolish the light avoidance behaviour in animals 
without Bolwig organs (Supplementary Fig. 26). Together, our physio- 
logical and behavioural studies indicate that a light transduction path- 
way involving TrpAl and Gr28b in class IV dendritic arborization 
neurons is necessary for light avoidance. 


Discussion 

Extra-ocular photoreceptors, previously found in reptiles, birds, amphi- 
bians and fish, provide a good measure of ambient light luminance and 
serve mainly non-image-forming functions such as phototaxis, circadian 
photo-entrainment, pupal reflex, shadow reaction and magnetic orienta- 
tion’’. Usually, these extra-ocular photoreceptors have much lower light 
sensitivity and slower kinetics than ocular photoreceptors”. 

Drosophila larvae have primitive eye structures, the Bolwig organs, 
which control avoidance of dim light’*. Here we report that the class IV 
dendritic arborization neurons, previously implicated in mechano- 
sensory response and motion control’**', are surprisingly also 
photoreceptors. Our behavioural analysis suggests that Bolwig organs 
and class IV dendritic arborization neurons have different regimes of 
light sensing in acute photoavoidance. Bolwig organs, packed with 
photopigments”, are preferentially required for avoidance of low light. 
Class IV dendritic arborization neurons, which also contribute to low 
light avoidance, are the primary sensors at sunlight-level intensities. 


channelrhodopsin-2; rpr, reaper; NS, not significant. 


This organization ensures that larvae can detect the full range of ambi- 
ent light intensities, from dim to strong. 

Class IV dendritic arborization neurons have the intrinsic ability to 
sense light, even after isolation in culture, and their dendrites are 
capable of sensing light (Fig. 3 and Supplementary Fig. 10). Importantly, 
the dendrites of class IV dendritic arborization neurons have complete 
and non-redundant coverage of the body wall (Fig. 3c), allowing 
animals to perceive illumination of any body part, and initiate an 
appropriate behavioural response. Larvae spend much of the time with 
their heads digging into food, making their Bolwig organs on the head 
less likely to be exposed to light. Thus, the ability to sense light with 
sensory neurons tiling the body wall is critical for detection of exposure. 

Class IV dendritic arborization neurons use a novel light transduc- 
tion pathway. Like in C. elegans, a putative chemosensory G-protein- 
coupled receptor, Gr28b, is involved for phototransduction in class IV 
dendritic arborization neurons (Fig. 4b-e). TrpA1 also is essential 
(Fig. 4f, g and Supplementary Fig. 18). Drosophila larval class IV 
dendritic arborization neurons may function as nociceptors****”. 
They are required for thermal and mechanical nociception, and 
activation of class IV dendritic arborization neurons is sufficient to 
induce a behaviour pattern similar to nocifension****’. Given that 
class IV dendritic arborization neurons are required for larvae to 
avoid harmful light stimuli, these neurons seem to be poised to alert 
the animal to a variety of adversities. 

Our study has uncovered unexpected light-sensing machinery, 
which could be critical for foraging larvae to avoid harmful sunlight, 
desiccation and predation. By providing precedence for photoreceptors 
strategically placed away from the eyes, our finding of an array of class 
IV dendritic arborization neurons with elaborate dendrites tiling the 
entire body wall, and acting as light-sensing antennae, raises the ques- 
tion of whether other animals with eyes might also possess extra-ocular 
photoreceptors for more thorough light detection and behavioural 
response. 


METHODS SUMMARY 

Light avoidance assay. Light avoidance was scored if the third instar larva 
reversed in direction or turned its head completely away from the 1.7-mm light 
spot on its head during the 5-s illumination. Two-tailed Fisher exact test (20—40 
larvae per condition), *P < 0.05, **P < 0.01, ***P < 0.001. 

Electrophysiology. Action potentials were monitored via extracellular recordings 
from a third instar larval fillet with muscles removed, using an Axon 700B 
amplifier and pCLAMP 10 software. 
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GCaMP3 imaging. Third instar larval fillets were imaged on a Zeiss LS510 
META confocal microscope with an Olympus *40/0.8 NA water immersion 
objective, with a 488-nm laser. GCaMP3 cDNA is available from AddGene. 
Cell culture. Embryos homozygous for Canton S (Cs); UAS-GCaMP3; ppk- 
GAL4, UAS-RFP (for class IV dendritic arborization neuron culture) or Cs; 
UAS-GCaMP3; 19-12-GAL4, UAS-RFP (for class III dendritic arborization neuron 
culture) were used for culture”””®. 

MARCM analysis. We recorded from class IV dendritic arborization neuron 
clones marked with GFP for lacking TrpA1 (ref. 36). 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Fly stocks. The following fly strains were used: (1) Cs; (2) Cs; GMR-Hid; (3) Cs; 
ppk-GAL4, UAS-Tomato; (4) Cs;; ppk-GAL4, UAS-mCD8::GFP; (5) Cs; ppk- 
GAL4, UAS-mCD8::RFP; (6) Cs;; TrpA1; (7) w; UAS-TrpA1 RNAi; (8) Cs; UAS- 
Dicer; ppk-GAL4, UAS-mCD8::GFP; (9) Cs; 21-7-GAL4, UAS-GCaMP3; (10) Cs; 
UAS-GCaMP3; ppk-GAL4, UAS-mCD8::RFP; (11) Cs; UAS-GCaMP3; 19-12- 
GAL4, UAS-mCD8::RFP; (12) elav-GAL4, hsFLP, UAS-mCD8::GFP,;tub- 
GAL80, FRT™; (13) w;; TrpAl, FRT**/TM6B, Tb; (14) Cs; MiET103888 
(Bloomington stock centre no. 24190); (15) Cs; PBac01884 (Bloomington stock 
centre no. 10743); (16) w; UAS-Gr28b RNAi (VDRC stock centre no. kk101727); 
(17) Cs;; cry" (ref. 28); (18) Cs;; Rh3’; (19) Css; Rh4"; (20) Cs; RAS; (21) Cs;; Rh6'; 
(22) ninaE”’; (23) norpA’”*; (24) Df(2L)Exel7031; (25) yw, UAS-Hid,rpr; (26) Cs; 
Channelrhodopsin-2; (27) UAS-TrpA1; (28) Cs; trpP°?, (29) Cs; painless’. 

Light avoidance assay. Animals were raised at 25 °C in an incubator with 12h 
light/dark cycles and humidity control (Darwin Chamber Company). Ninety-six 
hours after egg laying (AEL), third instar larvae were gently picked up from the 
vial, washed twice with PBS and transferred to a 100-mm Petri dish with fresh 2% 
agarose. Excessive water was removed from the animals. Animals were allowed to 
rest on the plate for at least 3 min before testing. Only animals making straight 
forward movement were selected for the assay. Each animal was tested once. The 
assay was carried out with a Stereo Microscope system (Leica M205FA). Unless 
otherwise specified, light was delivered from a 300 W xenon lamp (Sutter LB-LS/ 
30) through a PLANAPO 1 objective (Leica) at X 160 magnification, yielding a 
light spot of 1.7 mm in diameter. To direct the light to the animal’s head, the plate 
was manually moved so that only the head appeared in the field of view. An 
avoidance response was scored when animals stopped forward movement during 
the 5-s light illumination by either initiating backward movement or turning their 
heads completely away from the light spot. The 5-s light illumination was con- 
trolled by a shutter (Sutter Instruments) in the xenon lamp house triggered by an 
external stimulator (Grass s88). The xenon lamp has a light intensity spectrum 
similar to sunlight. The background light for visualizing animals was filtered to 
red, to which they are insensitive (Lee filter no. 027, medium red), and the entire 
event was recorded through a lens (Fujinon, 25mm 1:1.4) to a CCD camera 
(Qimaging Rolera XR) at 6 frames per second. The camera was mounted on a 
tripod and placed on the side of the Petri dish, with the front of the lens covered 
with red filters (Lee filter no. 027, medium red) to avoid overexposure. For violet, 
blue, green or red light, the band-pass excitation filter (in nm: 402 + 7.5, 
470 + 20, 525 + 25, 620 + 30) was placed in the xenon lamp house, and a filter 
set with empty excitation filter was placed into the Leica scope. White light 
illumination was achieved the same way except that no excitation filter was placed 
in the xenon lamp house. For 360 nm illumination, a HXP-120 light source 
(Visitron Systems) was used, and a 360 + 20 nm excitation filter set was placed 
into the Leica scope. The light intensity at X 160 magnification was measured by a 
radiometric sensor head (Newport 818P-001-12) coupled with a power meter 
(Newport 1918-C). Because liquid light guides connecting the light source and the 
microscope ensured the uniformity of light, the light intensity was calculated by 
dividing the measured light intensity over area (2.27 mm”). Temperature changes 
associated with light illumination were measured with an IT-24P thermocouple 
probe coupled with a BAT-10 thermometer (Physitemp). A red filter (Lee filter 
no. 027, medium red) was used to cover the eyepieces of the microscope to protect 
experimenters’ eyes from strong light. Light avoidance of animals expressing 
channelrhodopsin-2 (ChR2) to 0.25mW mm ~ green light was done the same 
way with the exception that eggs were laid and allowed to develop to third instar 
in food medium supplemented with 0.2 mM retinal. Twenty to forty animals were 
tested in each condition and the percentage of positive responses was calculated. 
A two-tailed Fisher exact test was performed and statistical significance was 
assigned, *P < 0.05, **P<0.01, ***P <0.001. 

Cell culture. Dissociated cell cultures were prepared from early gastrulas of 
Drosophila melanogaster, as described previously”. Briefly, embryos were col- 
lected on grape agar plates with yeast paste and incubated for another 3.5h at 
25°C. After removing yeast paste carefully, embryos were washed extensively 
with 500 ml sterile HO. To remove the chorionic membrane and sterilize the 
embryos, embryos were treated with bleach/90% EtOH (1:1 by volume, final 
concentration of sodium hypochlorite is 3%) solution for 1 min and then washed 
with 500 ml sterile H,O to remove residual bleach and EtOH. After wash, 
embryos were homogenized in Schneider's medium supplemented with 5% 
FBS, 0.2pgml~' insulin, penicillin (50 unitsml~') and streptomycin (50 pg 
ml‘), with a 15-ml Dounce homogenizer containing 6 ml medium. Three to 
five rotary up and down strokes were used to dissociate the cells. To remove 
undissociated clumps and large debris, dissociated cells were filtered through a 
40-um cell strainer. Filtrate was collected in a 15-ml centrifuge tube, and cells 
were centrifuged at 2,000 r.p.m. for 5 min. Cells were washed in the medium and 
centrifuged again as described, followed by a final suspension in 10 ml medium. 
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Cells were plated on poly-L-lysine-treated coverglass, and allowed to develop at 
room temperature for 2-4days before GCaMP3 imaging was performed. 
Embryos carrying Cs; UAS-GCaMP3; ppk-GAL4, UAS-mCD8::RFP were used 
for culturing class IV dendritic arborization neurons, and embryos carrying Cs; 
UAS-GCaMP3; 19-12-GAL4, UAS-mCD8::RFP were used for culturing class III 
dendritic arborization neurons. 

Electrophysiology. Fillets were made from 96-h-AEL third instar larvae with 
cuticle facing down in the external saline solution composed of (in mM): NaCl 
120, KCl 3, MgCl, 4, CaCl, 1.5, NaHCO; 10, trehalose 10, glucose 10, TES 5, 
sucrose 10, HEPES 10. Osmolality was 305mOsmkg™' and final pH = 7.25. 
Muscles covering the neurons of interest were gently digested by proteinase 
(Sigma). During the enzymatic treatment, a laminar flow of external saline solu- 
tion was turned on to remove excess enzyme. No detectable difference in dendrite 
morphology of class IV dendritic arborization neurons was observed before and 
after muscle digestion, indicating that neurons were intact. An Olympus BX51WI 
microscope with a X40/0.8 NA water immersion objective was used to obtain 
recordings with the help of IR-DIC optics and a CoolSNAP CCD (Photometrics). 
Recording pipettes were pulled with P-97 puller (Sutter instruments) from thin 
wall borosilicate glass (World Precision Instruments), filled with external saline 
solution, with a tip opening of 5 jm. Gentle negative pressure was delivered to 
suck the soma to get good signal-to-noise ratio of recording traces. Recordings 
were performed with a Multiclamp 700B amplifier (Molecular Devices), and data 
were acquired with Digidata 1440A (Molecular Devices) and Clampex 10.0 soft- 
ware (Molecular Devices). Extracellular recordings of action potentials were 
obtained in voltage clamp mode with a holding potential of 0 mV, with a 2 kHz 
low-pass filter and sampled at 20 kHz. During recording, no background light 
illumination was applied. A 300-W xenon light source was connected to the 
microscope with a liquid light guide to provide light stimulation through a 
X40/0.8 NA water-immersion lens, yielding an evenly illuminated light spot with 
600 um diameter, which covered the entire class IV dendritic arborization neu- 
ron. Dendritic illumination was achieved by decreasing the field diaphragm to 
cover about 50% of dendrites (Fig. 3d), without illuminating the soma. Light 
intensity was measured the same way as in the behaviour assay and intensity 
density was calculated. Neutral density filters (Chroma) were used to reduce light 
intensity to generate dose-response curves. The duration (5 s) of light illumina- 
tion was controlled by a shutter in the xenon lamp house triggered by Digidata 
1440A (Molecular Devices). Band-pass excitation filters (Chroma) were used to 
select light wavelength: they were (innm) 340+10, 380+10, 402+ 7.5, 
470 + 20, 525 + 25, 620 + 30. White light illumination was achieved the same 
way with the exception that no excitation filter was placed in the xenon lamp 
house. For each recording trace, average frequency during the 5s immediately 
before light exposure was used as control. Five-second light stimulation was 
controlled by a TTL-triggered shutter (Sutter Instruments) in the xenon lamp 
house. For latency analysis, only neurons with low spontaneous firing were 
included for recording, and latency was defined as the time between onset of 
light and onset of burst firing. To record temperature-induced firing change, pre- 
heated solution was perfused into the recording chamber. Temperature was 
monitored by the thermal probe connected with the thermometer (Warner Tc- 
324B). mSIRK (EMD bioscience) or L-cis-diltiazem (Sigma) was incubated in the 
recording chamber for 30 min before recording. A two-tailed paired or unpaired 
t-test, or one-way ANOVA, followed by the Bonferroni multiple comparison test, 
was performed and statistical significance was assigned, *P < 0.05, **P<0.01, 
***P < 0.001. 

GCaMP3 imaging experiments. Homozygous Cs; 21-7-GAL4, UAS-GCaMP3 
animals were used for imaging. The 21-7 promoter drives Gal4 expression in 
all peripheral nervous system sensory neurons in the larval body wall except 
the chordotonal organ (H. H. Lee, Y.X., L.Y.J. and Y.N.J., unpublished data). 
Fillet preparation was the same as the one used in recording with the exception 
that the cuticle was facing up to mimic the orientation of larvae in receiving 
natural light. No enzymatic digestion was performed. Data were collected on a 
Zeiss LS510 META confocal microscope with an Olympus X40/0.8 NA water 
immersion objective. GCaMP3 fluorescence was excited with a 488-nm laser’*"”. 
Laser scanning by itself didn’t activate class IV dendritic arborization neurons, as 
evidenced by the flat baseline of GCaMP3 signals during the 0-60 s control period 
(Fig. 2e). The images were acquired at 512 X 512 pixels at 12-bit dynamic range. 
The duration of 5s of light stimulation (91, 68 and 96 mW mm ° for 365nm 
ultraviolet, 470 nm blue and 546 nm green light, respectively) was controlled by 
manually switching the filter cube from laser-scanning position to epifluores- 
cence position. Two seconds of light stimulation elicited similar results as five 
seconds (data not shown). Average GCaMP3 signals from 60 s before light stimu- 
lation was taken as Fo, and AF/Fo was calculated for each data point. GCaMP3 
signals from the soma were analysed, although axons and dendrites also showed 
responses. As a control, Cs; UAS-GCaMP3; ppk-GAL4, UAS-mCD8::RFP animals 
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were raised to look for nonspecific effects of excitation. In these animals, we did 
not detect any change in RFP signals in response to light (data not shown). 

For GCaMP3 imaging of cultured neurons, class IV or class III dendritic 
arborization neurons were identified by the co-expressed RFP signals. Light 
stimulation was carried out as described above. For AITC stimulation, an equal 
volume of 200 pM AITC was manually applied to the chamber, making the final 
concentration 100 uM. 

MARCM analysis. MARCM analysis was carried out as described previously”*. 
TrpAl mutant class IV dendritic arborization neurons with GFP signals were 


selected for recording. For TrpAl*’~ heterozygous neurons without GFP 
expression, class IV dendritic arborization neurons were identified by location. 

Nose-touch assay. Assay was performed as described previously”. Briefly, the 
larvae were touched with an eyebrow hair affixed to the tip of a dissecting needle. 
The scoring system is as follows: 0 = no response to touch; 1 =a response of 
pausing mouth hook movement; 2 = responding by withdrawing the anterior or 
turning away from the touch; 3 = a single reverse peristaltic wave away from the 
touch; and 4 = multiple peristaltic waves away from the touch. A two-tailed 
Fisher exact test was performed. 
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TRIM24 links a non-canonical histone 
signature to breast cancer 


Wen-Wei Tsai’*, Zhanxin Wang**, Teresa T. Yiu'*, Kadir C. Akdemir*, Weiya Xia‘, Stefan Winter®, Cheng-Yu Tsai’, 
Xiaobing Shi>?, Dirk Schwarzer®, William Plunkett’, Bruce Aronow”, Or Gozani’, Wolfgang Fischle°, Mien-Chie Hung*", 
Dinshaw J. Patel? & Michelle Craig Barton’? 


Recognition of modified histone species by distinct structural domains within ‘reader’ proteins plays a critical role in the 
regulation of gene expression. Readers that simultaneously recognize histones with multiple marks allow transduction of 
complex chromatin modification patterns into specific biological outcomes. Here we report that chromatin regulator 
tripartite motif-containing 24 (TRIM24) functions in humans as a reader of dual histone marks by means of tandem plant 
homeodomain (PHD) and bromodomain (Bromo) regions. The three-dimensional structure of the PHD-Bromo region of 
TRIM24 revealed a single functional unit for combinatorial recognition of unmodified H3K4 (that is, histone H3 
unmodified at lysine 4, H3K4me0) and acetylated H3K23 (histone H3 acetylated at lysine 23, H3K23ac) within the 
same histone tail. TRIM24 binds chromatin and oestrogen receptor to activate oestrogen-dependent genes associated 
with cellular proliferation and tumour development. Aberrant expression of TRIM24 negatively correlates with survival 
of breast cancer patients. The PHD-Bromo of TRIM24 provides a structural rationale for chromatin activation through a 
non-canonical histone signature, establishing a new route by which chromatin readers may influence cancer 


pathogenesis. 


Post-translational modifications of histones occur in combinations that 
must be faithfully translated by effector proteins, or histone readers'~*. 
The lexicon of histone modifications may be highly context-dependent, 
influenced by inductive signalling, cellular milieu and target gene status*. 
Misinterpretation or imbalance in this hierarchal arrangement has dire 
consequences for cellular homeostasis, leading to developmental 
problems, hereditary disease or tumour development’. Linked histone 
reader modules, such as tandem PHD finger and bromodomain, occur 
frequently in proteins that interact with histone, but little is known about 
their mechanisms of action. Combinatorial readout of histone post- 
translational modifications (PTMs) may enhance binding between spa- 
tially separated histone marks, or even create communication links 
between domains or members of the complex’. Individually, proteins 
with bromodomains—for example, TAF1 and BDF1—associate with 
acetylated lysines with broad specificity®’, while PHD-containing 
proteins are less predictable in their interactions’ *. The PHD fingers 
of BHC80 and AIRE interact with unmethylated H3K4 (H3K4me0)*”, 
while other previously reported PHD finger domains bind methylated 
proteins as modifiers of histones or as subunits of chromatin- 
remodelling, co-activator or co-repressor complexes'*!°”, 
PHD-finger proteins and their dysregulation are linked to a broad 
spectrum of human diseases, underscoring an essential role in home- 
ostasis’. Recently, aberrant localization of a JARID1A PHD finger- 
fusion protein was shown as directly causal in transformation and 
development of haematopoietic malignancy, which is a process 
requiring fusion protein recognition of H3K4me3 via the JARID1A 
PHD finger’®. Here, we present evidence that a multi-functional 


protein, TRIM24, which is an E3-ubiquitin ligase that targets p53 
(ref. 19) and is broadly associated with chromatin silencing”, relies 
on PHD-Bromo to recognize specific, combinatorial histone modifi- 
cations and activate oestrogen-dependent genes associated with cel- 
lular proliferation and tumour development. Genome-wide analysis 
of chromatin interactions shows oestrogen-dependent binding of 
TRIM24 and oestrogen receptor % (ER) at sites that paradoxically 
exhibit oestrogen-activated loss of H3K4me2 and gain of histone 
acetylation. Importantly, aberrant overexpression of TRIM24 in 
breast cancer patients is frequent and directly correlated with poor 
survival. 


TRIM24 PHD-Bromo binds amino-terminal H3 tail 


TRIM24 belongs to the TRIM/RBCC protein family, characterized by a 
conserved, N-terminal tripartite motif—namely, a RING domain, B-box 
zinc-fingers, and a coiled-coil region—as well as variable carboxy- 
terminal domains*!”. TRIM24 was originally identified as transcrip- 
tional intermediary factor (TIF) 10, a ligand-dependent, co-repressor 
of retinoic acid receptor that interacts with multiple nuclear receptors 
in vitro viaan LXXLL motif™. In addition to its LXXLL motif and RING 
domain, TRIM24 has a C-terminal, PHD-Bromo (Fig. 1a), which prob- 
ably recognizes histones or non-histone proteins with specific combina- 
tions of post-translational modifications. 

Protein sequence alignment of the PHD fingers of TRIM24 and 
BHC80 with ING1, a PHD domain that recognizes H3K4me3 (refs 
24,25), showed TRIM24 as highly similar to BHC80 with conser- 
vation of residues critical for BHC80-H3K4me0_interactions® 
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Figure 1 | TRIM24 PHD finger interacts with unmethylated H3K4. 

a, Diagram of TRIM24 protein domains. b, Biotinylated peptide pulldowns: 

recombinant PHD fingers and histone peptides. c, GST-pulldowns: 

recombinant proteins and native histone proteins, PHD-Bromo (PB). d, ITC 

titration: binding of TRIM24 PHD-Bromo with histone peptides. 


(Supplementary Fig. la). Accordingly, we found that full-length 
TRIM24 interacts with histone proteins specifically through its 
PHD-Bromo (Supplementary Fig. 1b). Binding of the TRIM24 
PHD-Bromo to histone peptide arrays occurs at unmodified H3 (resi- 
dues 1-21), methylated H3K9 (H3K9me) and acetylated H3K9/K14 
peptides, but not methylated H3K4 residues (Supplementary Fig. 1c). 
Similarly, TRIM24 PHD finger and PHD-Bromo bind unmodified 
histone H3 (residues 1-21) but not methylated H3K4, similar to 
BHC80 but unlike ING1, which preferentially binds to H3K4me pep- 
tides (Fig. 1b and Supplementary Fig. 1d). Glutathione S-transferase 
(GST)-pulldown assays with native histones confirmed that TRIM24 
PHD finger, bromodomain, PHD-Bromo and the BHC80 PHD fail to 
bind to native histone H3 with K4 trimethylation (H3K4me3) but 
tolerate H3K9me2 modification (Fig. 1c and Supplementary Fig. le). 
Isothermal titration calorimetry (ITC)-based binding assays estab- 
lished that the PHD-Bromo binds unmodified H3(1-15)K4 with a 
dissociation constant, Kp, of 8.6uM, while methylation of H3K4 
greatly decreases binding affinity of TRIM24 and H3 peptides 
(Fig. 1d and Supplementary Table 2). These results suggest that 
TRIM24 PHD-Bromo interacts with the N-terminal tail of histone 
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H3, but that specific PIMs, for example, methylation of H3K4, inter- 
fere with this interaction. 


Structural basis of H3 readout by TRIM24 


We have determined the three-dimensional crystal structure of the 
PHD-linker-Bromo segment (residues 824-1006) of human TRIM24 
in free and histone peptide bound states. The overall structure of 
TRIM24 PHD-Bromo in the free state demonstrates that PHD and 
bromodomain interact extensively and form an integrated structural 
unit (747 A? of contact surface), connected by a long linker and sta- 
bilized by a network of hydrogen bonding and hydrophobic interac- 
tions (Fig. 2a, Supplementary Fig. 2, Supplementary Table 1). The 
TRIM24 PHD finger residues 824-871 adopt the typical PHD finger 
‘cross-braced’ topology stabilized by a pair of coordinated zinc ions, 
which together with residues 872-884 from the linker region form an 
extended TRIM24 PHD domain. The TRIM24 bromodomain adopts 
the typical left-handed four-helical bundle characteristic of other 
members of this family. 

The 2.0 A co-crystal structure of TRIM24 PHD-Bromo and unmodi- 
fied H3(1-10)K4 peptide (Supplementary Table 1 and Supplementary 
Fig. 3a) showed that the first nine residues of bound H3 peptide are 
positioned within a surface groove of the PHD finger (Fig. 2b and 
Supplementary Fig. 3b). The R2 to Q5 segment of bound H3 peptide 
forms an anti-parallel B-sheet with the E837 to C840 segment of the 
PHD finger, while the T6 to K9 segment of bound H3 peptide contacts 
the N834 to G836 segment of the PHD finger. The side chain of R2 is 
hydrogen-bonded with the backbone carbonyl of C841. The side chain 
of C840 is positioned in-between the side chains of R2 and K4, with the 
C840W mutation losing its ability to bind unmodified H3K4 peptide 
(Kp > 400 uM, Supplementary Table 2 and Supplementary Fig. 4) . 

The unmodified lysine ammonium group of H3K4 forms two 
direct hydrogen bonds with backbone carbonyl oxygens of N825 
and E826 (Fig. 2b). In addition, the proximally positioned D827 forms 
a stabilizing salt bridge with the unmodified lysine, consistent with the 
observation of impaired binding between D827A mutant and 
unmodified H3K4 peptide (Kp = 133 uM, Supplementary Table 2). 
Methylation of H3K4 would create steric clashes with residues lining 
the binding pocket, disrupt the salt bridge interaction with D827, and 
impair hydrogen bonding with N825 and E826, thereby providing a 
structural explanation for the unmodified H3K4 preference of 
TRIM24 PHD-Bromo. 


TRIM24 bromodomain is H3K23ac-specific 


Both sequence and structure-based alignments indicate that TRIM24 
bromodomain is an acetyllysine reader. Peptide pulldown assays and 
NMR titration measurements suggest that TRIM24 bromodomain 
interacts with H3 peptides with K23 or K27 acetylation and several 
acetylated H4 peptides (Supplementary Fig. 5a, b). ITC studies establish 
that TRIM24 PHD-Bromo specifically binds to the H3(13-32)K23ac 
peptide with a value of Kp (8.8 11M; Supplementary Table 2) com- 
parable to tetra-acetylated H4 peptide and double bromodomain 
modules of TAF1 or BDF1. 

We solved the 1.9A crystal structure of the complex of TRIM24 
PHD-Bromo and H3(13-32)K23ac peptide (Supplementary Table 1 
and Supplementary Fig. 6a). Residues 23-27 of the bound H3(13- 
32)K23ac peptide exhibit sequence-specific interactions with TRIM24 
bromodomain (Fig. 2c and Supplementary Fig. 6b). The acetyllysine 
side chain forms a direct hydrogen bond with the side chain of con- 
served N980. Acetyllysine recognition constitutes the binding deter- 
minant, as double mutant F979A/N980A loses most of the binding 
affinity for the H3(13-32)K23ac peptide (Supplementary Table 2). 

ITC studies establish that H3(1-20)K9ac, H3(1-19)K14ac and 
H3(13-32)K27ac bind non-specifically to the TRIM24 bromodomain 
(Kp © 200 1M; Supplementary Table 2). The crystal structure of the 
complex of TRIM24 PHD-Bromo with H3(23-31)K27ac peptide 
(Supplementary Table 1) revealed a single intermolecular hydrogen 
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TRIM24 PHD-Bromo 


Figure 2 | TRIM24 PHD-Bromo simultaneously binds H3K4me0 and 
acetylated histone lysines. a, Stereo view of the crystal structure of TRIM24 
PHD-Bromo in the free state. b, Detailed interactions between PHD of TRIM24 
PHD-Bromo and H3(1-10)K4 peptide. c, Detailed interactions between 
bromodomain of TRIM24 PHD-Bromo and H3(22-29)K23ac peptide. 

d, Positioning of H3(1-10)K4 and H3(13-32)K23ac peptides on the surface of 


bond between the side chains of K27ac and N980, while other histone 
residues did not show any direct intermolecular contacts with the 
bromodomain (Supplementary Fig. 7), consistent with weak binding 
affinity of H3(13-32)K27ac peptide. The structure of TRIM24 PHD- 
Bromo bound to H4(14-19)K16ac peptide containing the conserved 
interaction between Kl6ac and N980 side chains is shown in 
Supplementary Fig. 8. 

The structures of TRIM24 PHD-Bromo complexes with acetyllysine- 
containing histone peptides show that acetyllysine invariantly inserts 
into a pre-formed acetyllysine-binding pocket of the bromodomain. 
With the acetyllysine as an anchor, flanking residues determine 
sequence specificity of acetyllysine peptides for the TRIM24 bromodo- 
main. The H3(13-32)K23ac peptide both fits better within the cleft 
between ZA and BC loops, and shows sequence-specific interactions 
with TRIM24 bromodomain spanning K23ac to K27, creating much 
higher affinity for the TRIM24 bromodomain than is shown by other 
acetyllysine-containing peptides. 


Combinatorial readout by TRIM24 PHD-Bromo 


Superimposition of the above structures of complexes revealed that 
H3K4 and H3K23ac peptides are aligned in the same direction on the 


TRIM24 PHD-Bromo 
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surface of the TRIM24 PHD-Bromo (Fig. 2d). The distance between 
the Cox of H3K9 and the Co of H3K23ac is 25.5 A, which allows one 
H3 peptide containing both unmodified H3K4 and H3K23ac to 
simultaneously target the PHD and bromodomain binding sites on 
TRIM24 PHD-Bromo. 

By contrast, H3K4 and H3K27ac (or H4Kl6ac) peptides are 
aligned in opposite directions on the surface of TRIM24 PHD- 
Bromo (Supplementary Fig. 9), which indicates that the TRIM24 
PHD-Bromo requires two histone tails, either within a single nucleo- 
some or from an adjacent pair of nucleosomes, to simultaneously bind 
H3K4 and H3K27ac (or H4K16ac). 

To test the effect of combinatorial readout of TRIM24 PHD-Bromo 
on histone H3 bearing unmodified K4 and acetylated K23 dual marks, 
we synthesized longer H3(1-33) peptides bearing both unmodified K4 
and acetylated K23 marks. For controls, we used H3(1-33)K4me3K23ac, 
as well as H3(1-33)K4 peptides that have only one effective histone mark 
for specific TRIM24 PHD-Bromo recognition. On the basis of ITC 
binding assays, TRIM24 PHD-Bromo showed an approximately 90-fold 
higher binding affinity for H3(1-33)K4K23ac peptide (Fig. 2e, 
Kp = 0.096 uM) compared to the shorter H3(1-15)K4 peptide bearing 
only unmodified K4 (Kp = 8.6 1M) or for the H3(13-32)K23ac peptide 
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bearing only acetylated K23 marks (Kp = 8.8 uM). Without acetylation 
on K23, the binding for H3(1-33)K4 is 24-fold weaker (Fig. 2e; 
Kp =2.3 UM); when Ké4 is tri-methylated, the binding for H3(1- 
33)K4me3K23ac is sixfold weaker (Fig. 2e; Kp = 0.56 UM). Similarly, 
mutants that disrupt either the PHD finger binding pocket (C840W) 
or the bromodomain binding pocket (F979A/N980A) also decreased 
binding for H3(1-33)K4K23ac peptide by 6-7 fold (Fig. 2e and 
Supplementary Table 2). 

On the basis of fluorescence polarization (FP)-based measurement, 
wild-type TRIM24 PHD-Bromo also showed strong binding affinity 
for H3(1-33)K4K23ac peptide (Kp = 0.185 LM); peptides trimethy- 
lated at K4 or without acetylation at K23 displayed 13-23 fold weaker 
interaction (Fig. 2f and Supplementary Table 3). Mutation on the 
PHD finger binding pocket (C840W) or the bromodomain binding 
pocket (F979A/N980A) showed similar decrease in binding affinities 
(Fig. 2f and Supplementary Table 3). These binding data strongly 
support our structural results, which indicate that unmodified 
H3K4 and acetylated H3K23 are a pair of natural histone marks 
targeted by TRIM24 PHD-Bromo that can be read in a combinatorial 
manner on a single histone peptide. This combinatorial readout can 
greatly increase the recruitment of TRIM24 to nucleosomes bearing 
these two marks. 


TRIM24 and ERa recruitment to chromatin 


Combinatorial histone modifications of unmethylated H3K4 along- 
side acetylated lysines have no straightforward interpretation by the 
models of chromatin modification and regulated activation or repres- 
sion of transcription. We considered a model where TRIM24 regulates 
gene expression by specific binding to chromatin with non-canonical 
combinations of PTMs, and focused on co-regulation of ER. This was 
because in vitro interactions between TRIM24 and nuclear receptors, 
including ERa, are ligand-dependent (Supplementary Fig. 10 and ref. 
26), and because ligand-activated, ER-response elements (EREs) are 
notably independent of H3K4me2 and H3K4me3 modifications”. 
We used chromatin immunoprecipitation (ChIP) and sequential ChIP 
analyses of ERo-positive, MCF7 breast cancer cells to assess whether 
TRIM24 is recruited with ERx to specific EREs of the GREB1I, PR and 
pS2/TFF1 genes (Fig. 3a, b, and Supplementary Fig. 11). Oestrogen- 
activated recruitment occurs robustly within 15 min, and by six hours 
yields a sevenfold increase of ERx binding and a sixfold increase of 
TRIM24 binding at the GREB1 distal ERE, ~40 kilobases (kb) 
upstream of the transcription start site (Fig. 3a). ChIP analysis of 
H3K4me2/3 after oestrogen treatment indicates that quantified 
H3K4me2 and H3K4me3 levels decreased at distal ERE sites 
(Supplementary Fig. 12 and ref. 27) and, when normalized for nucleo- 
somal occupancy, decreased or was unchanged at distal EREs (Fig. 3c 
and Supplementary Fig. 13). Importantly, TRIM24 is recruited in the 
absence of changes in H3K4 methylation. In contrast, H3K23ac, 
H3K27ac and H4ac, which are targeted by the TRIM24 bromodomain, 
are enriched at both distal and proximal EREs after oestrogen-addition 
(Fig. 3d). These findings suggest that TRIM24 interacts with ERo and 
chromatin lacking H3K4 methylation but enriched in lysine acetyla- 
tion, as suggested by our structural analyses, in response to oestrogen. 

These findings stand in contrast to a model of chromatin accessibility 
at ER binding sites, facilitated by FOXA1 and H3K4me2 enrichment in 
response to oestrogen treatment”, but are in agreement with findings 
that H3K4me3 is not present at a majority of distal ERE regions**. We 
evaluated global chromatin-association of TRIM24, ER and 
H3K4mez2, by ChIP and deep sequencing of antibody-enriched DNA 
fragments (ChIP-seq). These analyses revealed binding of TRIM24 and 
ER at more than 10,000 sites genome-wide; half of which, in each case, 
are oestrogen-dependent (Fig. 3e and Supplementary Fig. 14a). Shared 
target sites of ERx and co-regulator TRIM24 increase dramatically 
(eightfold) in response to oestrogen (Supplementary Fig. 14b), are 
highly enriched (P value <0.001) at genes regulated by oestrogen” 
(Supplementary Fig. 14c), and function in cell cycle, kinase activity 
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Figure 3 | TRIM24 is recruited with ERa to ERE sites depleted of H3K4me2. 
a, ChIP of ERx and TRIM24 binding at EREs of GREB1, 15 min and 6h after 
treatment with oestradiol (E). Vehicle, EtOH. b, Sequential-ChIP: ERa and 
TRIM24, 6h after E, addition. c, d, ChIP for H3 and histone modifications, 
15 min and 6h after E>, normalized for H3. Each bar represents averaged 
results, n = 3 biological replicates, assayed 3 times each; error bars, s.d. 

e, Genome-wide TRIM24 and ER« binding sites in MCE7 cells, —E, or +E. 
Two independent experiments analysed. f, Normalized genome-wide 
H3K4me2 within a window of 800 bp, centred at TRIM24 binding sites 
(designated as 0), +E (blue line) or —E (red line). 


and signal transduction (DAVID analyses”, Supplementary Table 4). 
Biological pathway analysis (Ingenuity Systems, www.ingenuity.com) 
revealed that multiple gene targets of TRIM24 are associated with breast 
cancer (Supplementary Tables 5 and 6). The number of target sites 
shared by TRIM24 and ERo (1,677 sites) is similar to ERa and 
FOXAI (ref. 29), with little overlap among all three (263 sites) 
(Supplementary Fig. 14b). Consistent with our structural analyses, 
TRIM24 binding occurs globally at sites depleted of H3K4me2 
(Fig. 3f and Supplementary Figs 14d and 15). Thus, ERa-regulated 
genes may be divided into multiple classes, defined by specific co- 
regulators and their dependence on H3K4 methylation. 


TRIM24 is overexpressed in breast cancer 


Depletion of TRIM24 caused a significant decrease in ERo-mediated 
activation of GREB1, PR and pS2 gene expression (Fig. 4a and Sup- 
plementary Fig. 16a). Importantly, re-introduction of wild type (WT), 
but not PHD finger mutant (C840W), TRIM24 fully restored ERo- 
mediated transcription activation (Fig. 4b), and enabled ERa-response 
at lower levels ofhormone (Fig. 4c). Decreased ERo-mediated activation 
is due to loss of TRIM24-dependent ERa-interactions with chromatin 
(Fig. 4d and Supplementary Fig. 17), without alternation of ERo expres- 
sion (Supplementary Fig. 16b). H3K4me2/3 levels at the distal ERE of 
GREB1 lack hormone responsiveness and are TRIM24-independent 
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Figure 4 | TRIM24 functions as a co-activator and stabilizes ERa- 
chromatin interactions. a, Stable shControl and shTRIM24 MCF7 cells +E). 
*P < 0.05; **P < 0.01. b, TRIM24 WT and TRIM24 C840W expressed in 
stable shTRIM24 MCEF7 cells + E>. c, shControl and shTRIM24 MCE7 cells, E, 
range. TRIM24 WT or EGFP control expressed in shTRIM24 MCE7 cells. In 
a, b and c, GREBI RNA levels are normalized to GAPDH; untreated shControl 
MCH7 is set as one. Each bar is an average of 3 biological replicates, 3 
independent RT-PCR assays of each; error bars, s.d. d, ChIP of ERa and 
TRIM24 binding, histone H3 and histone modifications, 6 h after E, addition, 
shControl and shTRIM24 MCE7 cells. Histone modifications normalized for 
H3 recovery. Each bar represents averaged results, n = 3 and 3 assays of each; 
error bars, s.d. 


(Fig. 4d and Supplementary Fig. 16c). In contrast, nucleosomal occu- 
pancy at EREs is increased alongside decreased acetylation of H4, 
H3K23 and H3K27, reflecting loss of ERo.-activated chromatin structure 
(Fig. 4d and Supplementary Fig. 16c). 

Depletion of TRIM24 led to reduced survival and proliferation of 
tumour-derived breast cancer cells, and is highly additive with 4-OH- 
tamoxifen, an inhibitor of ERa”? (Fig. 5a). We immunostained tissue 
samples from a breast cancer patient cohort to assess the impact of 
TRIM24 expression on breast cancer survival (Fig. 5b). In 128 cases of 
non-metastatic breast cancer, expression of TRIM24 fell into four 
classes: N— and N+, undetectable to low level in few foci (29%); 
N++, abundant foci with expression in nuclear and cytoplasmic 
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Figure 5 | Aberrant expression of TRIM24 correlates with poor survival of 
breast cancer patients. a, shControl and shTRIM24 MCE7 cells with E, or Ey 
+ 4-OH-tamoxifen (4-OHT), as indicated. Each bar represents the averaged 
results for three independent colony formation assays in triplicate plates; error 
bars, s.d. *P < 0.0001. b, Immunohistochemistry: 128 surgical specimens of 
breast cancer immunostained for TRIM24: subcellular localization (N) and 
staining intensity (strong, +++; moderate, ++; weak or slightly above 
background, +; none, —). ¢, The overall survival rate of 128 patients with non- 
metastatic disease, classified by TRIM24 expression (as in b), plotted by the 
Kaplan-Meier method. 


compartments (20%); and N+ ++, abundant foci with high expres- 
sion in nuclei (51%). Overexpression of TRIM24 (+++, ++) is 
clearly correlated with poor patient survival, independent of ER status 
(Fig. 5c and Supplementary Table 7). 


Discussion 

Our identification of the PHD-Bromo as a reader of H3K4me0 and 
H3K23ac within a single histone tail, or of H3K4me0 and non- 
contiguous acetylated lysines, suggests that TRIM24 may have mul- 
tiple roles in chromatin regulation”’. TRIM24 is a co-activator of ERa 
at distal EREs, a platform well suited for stable interactions with 
TRIM24 PHD-Bromo. ER« recruits histone acetyltransferases—for 
example, CBP/p300, GCN5 and P/CAF (ref. 33)—to acetylate his- 
tones. LSD1 (KDM1), a biochemically and structurally characterized 
demethylase for H3K4me2/1 (refs 34, 35) and an androgen-regulated 
demethylase of H3K9me (ref. 36), is resident’’ or rapidly recruited”’ to 
EREs where H3K4 remains depleted of methylation even with oestrogen 
activation (Fig. 3c, Supplementary Fig. 18 and ref. 28). These parallel 
processes establish a combinatorial histone signature with high affinity 
for TRIM24 binding to chromatin. 

Aberrant expression of TRIM24 may promote tumour develop- 
ment and progression by multiple mechanisms of dysfunction. 
TRIM24 is a potent co-activator of ERa, which is associated with 
cellular proliferation and neoplasia in breast cells***’, and a negative 
regulator of p53 stability”. TRIM24 is a target of chromosomal trans- 
locations to form oncogenic fusion proteins in acute promyelocytic 
leukaemia*’, papillary thyroid carcinoma* and myeloproliferative 
syndrome”. Here we have shown that TRIM24 expression is directly 
correlated with poor patient survival in both ER-positive and ER- 
negative breast cancer. These results suggest that TRIM24 is a dual 
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domain histone reader with considerable potential as a therapeutic 
target in multiple cancers. 


METHODS SUMMARY 


Wild-type and mutant forms of TRIM24 PHD-Bromo were expressed in 
Escherichia coli and purified to homogeneity. Histone biotinylated peptides or 
purified histone proteins were incubated with GST-proteins, and bound proteins 
detected by immunoblotting. All crystals were obtained by the hanging-drop 
method at 20 °C; structures were solved by the molecular replacement method, 
and refined with cycled model building and refinement procedures. Histone 
peptides with or without biotin labelling were used for ITC binding. 
Fluorescein-labelled peptides were used for fluorescence polarization analysis. 
Stable short hairpin (sh)Control and shTRIM24 MCE7 cells were maintained 
with 2.5 tgml~! puromycin and, for hormone treatment, were grown in hor- 
mone-free media for 96 h before addition of ethanol or 10 nM oestradiol (Sigma) 
for indicated times. Global expression analyses and calculation of enrichment of 
shared TRIM24 and ER« binding at oestrogen-regulated genes” were deter- 
mined, and validated by real-time RT-PCR. Surgical specimens of breast cancer 
from 128 non-metastatic patients were immunostained for TRIM 24 (TRIM24 
antibody, Proteintech Group), and scored by subcellular localization, staining 
intensity, and fraction of positive staining. The overall survival after surgery 
was plotted by the Kaplan-Meier method”. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Cell culture. MCF7 and HEK293T cells were obtained from ATCC. Stable 
shControl and shTRIM24 MCE7 cells were maintained with 2.5 ug ml’ puro- 
mycin. For hormone treatment, MCF7 cells were plated under normal growth 
condition (DMEM with 10% FBS and 1 penicillin/streptomycin). When cell 
density reached 25% confluence, the media were changed to hormone-free media 
(phenol red-free DMEM, 10% charcoal dextran-treated FBS) for 96h. The 
hormone-deprived MCF7 cells were treated with ethanol or 10 nM oestradiol 
(Sigma) for indicated times. The TRIM24 shRNA for stable cell line creation 
was AAGCA GGTGG AACAG GATAT TAAAG TTGC. 

Biotinylated peptide pulldown assay. Histone biotinylated peptides (1 pg, 
Millipore) were incubated with GST-proteins (2g), in 5001 NTP buffer 
(300 mM NaCl, 50 mM Tris-HCl, 0.1% NP-40, pH 7.4) overnight at 4 °C. 20 pl 
of a 50% slurry of Streptavidin-coated beads (Thermo) were added and incubated 
for 1h at 4°C. The beads were recovered by centrifugation and washed 6 times 
(10 min at 4 °C) with NTP. The peptide/protein bound beads were resuspended in 
3X SDS-PAGE loading buffer, heated for 5 min at 95 °C and separated on 4-12% 
gradient polyacrylamide gels (Invitrogen). GST-proteins were detected by western 
blotting analysis with anti-GST antibody (Cell Signaling, dilution 1:1,000). 

GST protein pull-down assay, co-IP and transient transfection. GST-proteins 
(2 ug) were incubated with purified histone proteins (10 fig) in 500 pl NTP over- 
night at 4°C. 25 il of a 50% slurry of GST-beads were added and incubated for 2 h 
at 4°C, recovered by centrifugation and washed 4 times (10 min at 4°C) with 
NTP. The protein bound beads were separated by SDS-PAGE and subjected to 
western blotting analysis. Anti-GST (Cell Signaling), anti-TRIM24 (Novus 
Biological), anti-histone H3 (Abcam), anti-H3K9me2 (Active Motif) and anti- 
H3K4me3 (Active Motif) antibodies were used in immunoblotting. 

Ethidium bromide-treated and precleared cell lysate was used for IP with 
TRIM24 antibody (2.5 yl, Novus Biological) or control rabbit IgG antibody 
(2.5 ul, Upstate) as previously described’. Then anti-ER (F-10, Santa Cruz) 
and anti-TRIM24 were used in western blotting analysis. Transient transfections 
were performed as previously described*, but with Effectene (Qiagen) used 
according to manufacturer’s directions. 

Chromatin immunoprecipitation assay (ChIP). ChIP assays were performed as 
described”, with minimal modification. Briefly, treated MCF7 cells were cross- 
linked with 1% formaldehyde for 15 min. The cross-linking reaction was stopped 
with 0.125 M glycine. The cross-linked cells were washed with PBS three times 
and stored at —80°C before use. The fragmented, precleared chromatin lysate 
was incubated overnight with specific antibodies: ER (F-10, Santa Cruz), TRIM24 
(Novus Biological), histone H3 (Abcam), H3K4me2 (Active Motif), H3K4me3 
(Active Motif), H3K23ac (Active Motif), H3K27ac (Active Motif), H4ac 
(Upstate/Millipore) or normal sheep IgG (Upstate/Millipore). To analyse spe- 
cific, antibody- and protein-bound DNA, qPCR was conducted in a 7500 FAST 
ABI instrument. PCR primers were used as described’’**. 

ChIP-sequencing analysis. Purified ChIP DNA was prepared for sequencing 
using the Illumina ChIP-Seq sample preparation kit (IP-102-1001, Illumina). 
Sequencing was performed on an Illumina Genome Analyser II using 36 cycles. 
Sequences were aligned to human genome release hg18 (ref. 47) using the 
ELAND software at the default 0 to 2 allowed mismatch setting”. 

High-quality aligned read data was converted to the BED format and analysed 
for peaks using the MACS software programs”. Peaks showing altered pull-down 
patterns were mapped to the genome, and a list of candidate genes flanking each 
peak was generated. Foxal (ChIP-Chip) binding regions” were obtained from 
http://research.dfci.harvard.edu/brownlab/datasets/. Overlapping regions were 
identified if any of two regions share more than 80% of their length with the 
other region. Venn diagrams were prepared on the basis of that assumption. 

To identify possible target genes, a RefSeq data file was downloaded from 
UCSC genome browser (hg18). Genes within 10 kb of the binding regions were 
marked as target genes; DAVID”! was used to assess biological functions of 
targets, and Ingenuity Pathway Analysis (IPA) software was used to ascertain 
the pathways that are enriched by these target genes (Ingenuity Systems, www. 
ingenuity.com). For H3K4me2 signals around the Trim24 binding regions, in the 
presence or absence of oestrogen, numbers of fragments that overlap each posi- 
tion were counted. The Input signal was calculated by the same procedure and 
overall H3K4me2 signal values were normalized by Input signal. 

For intersection with global expression analyses and calculation of enrichment 
of shared Trim24 and ER binding at oestrogen-regulated genes*’, raw CEL files 
were downloaded from the GEO database. RMA normalization was used with 
default options (with background correction, quantile normalization, and log 
transformation) to normalize the intensity of each probeset. The SAM statistical 
method was used to select differentially expressed genes, based on a q-value of less 
than 2%. After filtration, a total of 1,887 genes were selected as differentially 
expressed after oestrogen treatment. Venn diagrams were formed based on the 
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proximity of binding regions to genes within 10 kb of the binding sites. P-values 
were calculated based on binomial tests. 

Real time RT-PCR analysis. RNA was isolated at indicated time points using 
Trizol reagent (Invitrogen) following manufacturer’s instructions. cDNA was 
synthesized using a RT-PCR kit (Invitrogen) according to manufacturer’s 
instructions. Synthesized cDNA was treated with RNaseH before performing 
PCR reactions. Real time PCR reactions were performed in an Applied 
Biosystems 7500 Fast Real time PCR instrument. Generally, 10 ul PCR reactions 
were set up in 96-well plates. The reaction mixtures contain 2 ll diluted cDNA 
(1:10 dilution), 0.25 pl each of forward and reverse primers (20 1M), 5 pl 2X 
SYBR Green Mix and 2.5 tl water. PCR primers were used as described*. 

GraphPad Prism5 software was used for analysis of P-values based on at least 
two independent experiments in three independent PCR reactions. The two-tailed 
paired t-test was used to compare the differences between two groups for relative 
fold. The two-tailed unpaired ¢-test was used to compare the differences between 
two groups for actual percentage. P-values <0.05 were considered statistically 
significant. 

Colony formation assay. 500 stable control shRNA depleted or TRIM24 shRNA 
depleted MCE7 cells were distributed in DMEM with 10% FBS and 1X P/S and 
then placed in 60 mm‘ plates at 37 °C for 14 days. Colonies were stained by crystal 
violet 14 days after seeding. Colonies of >50 cells were counted using a dissecting 
microscope. Each sample was performed in triplicate. 

Clinical patient samples and immunohistochemistry. We obtained 134 
archived blocks containing formalin-fixed, paraffin-embedded infiltrating breast 
carcinoma from the Department of Pathology, Shanghai East Breast Disease 
Hospital, China. All of the patients were women with non-metastatic disease 
who had undergone mastectomy and axillary lymph node dissection between 
1988 and 1994. After surgical treatment, the patients were offered adjuvant 
chemotherapy and/or radiotherapy and hormone therapy, depending on the 
number of lymph node metastases, status of menopause, and oestrogen and/or 
progesterone receptor positivity. The clinicopathological characteristics of the 
study population, including age, tumour size, lymph node status, tumour grade, 
and oestrogen receptor/progesterone receptor positivity, were obtained from 
medical records. The stage was assessed by the TNM clinical staging system of 
the American Joint Committee on Cancer. Patients were followed for 
4-72 months, with a median follow-up of 48 months. We have follow-up 
information from 128 patients. Most of the tissues in this cohort were described 
previously**, and used as approved by the Institutional Review Board. 

128 surgical specimens of breast cancer were immunostained for TRIM 24 
(TRIM24 antibody, Proteintech Group). Immunoreactivity of the TRIM24 antibody 
was scored according to the subcellular localization (nuclear and/or cytoplasmic), 
staining intensity (strong +++, moderate ++, weak + and faint or slightly above 
background), and fraction of positive staining. The mean fraction of positive tumour 
cells was determined in at least nine areas at 100 or 200 magnification. 
Immunoreactivity scores were as described previously. For example, 0 is no staining; 
+ is weak staining; ++ is moderate staining; + + + is strong staining and the values 
for percentage of positive tumour cells: 0 = 0-0.9%; 1 = 1-10%; 2 = 11-50%; 
3 = 51-80%; 4 = 81-100%. Magnification was X400; at least 200 tumour cells were 
counted**. ER assessment was done in a similar manner. 

Data on eligible patients were summarized by use of standard descriptive 
statistics and frequency tabulation. The overall survival after surgery was plotted 
by use of the Kaplan-Meier method. The log-rank test was used to analyse 
differences in survival time. All tests were two-sided and the level of significance 
was set at 0.05. 

Histone peptide microarray hybridization. Peptide microarray assays were 
performed as described”. Briefly, biotinylated histone peptides were printed in 
hexaplicates onto a streptavidin coated slide (ArrayIt) using a VersArray 
Compact Microarrayer (BioRad). After short blocking with biotin (Sigma), the 
slides were incubated with the GST-TRIM24 PHD-Bromo in binding buffer 
(300 mM NaCl, 50 mM Tris-HCl 7.5, 0.1% NP-40, 1 mM PMSF, 20% fetal bovine 
serum) overnight at 4°C with gentle agitation. After washing with the same 
buffer, slides were probed with anti-GST antibody and then fluorescein-conju- 
gated secondary antibody and visualized with a GenePix 4000 scanner (Molecular 
Devices). 

Protein preparation. As the long linker (30 residues) between TRIM24 PHD 
finger and bromodomain was predicted to be a loop, we first expressed PHD 
finger (824-886) and bromodomain (896-1016) separately, and then checked 
their interaction by the gel-filtration method. The mixture of PHD finger and 
bromodomain contained a higher molecular weight peak that corresponded to 
the complex fraction (Supplementary Fig. 16), which indicates that TRIM24 
PHD-Bromo forms a structural unit with direct interactions between the 
domains. Next, we focused on the expression and crystallization of the 
TRIM24 PHD-Bromo dual domain. We cloned different lengths of TRIM24 
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PHD-Bromo fragments into pRSFDuet-1 vector (modified pRSFDuet-1 vector 
with 6X His plus yeast sumo as fusion tag). Fragment 824-1011 was used for all 
the binding assays. Further, all the mutations were introduced into the same 
construct using a QuikChange Kit (Stratagene). Fragment 824-1006 was used 
for all crystallographic studies. Protein expression was carried out in BL21 (DE3) 
E. coli cells. Cells were grown until Agog reached around 1.0, then the media were 
cooled and 0.2 mM IPTG and 0.1 mM ZnCl, were added to the culture to induce 
the protein expression at 25 °C for 8h. 

E. coli-expressed wild-type and mutant TRIM24 PHD-Bromo were affinity 

purified by nickel-charged HisTrap Chelating FF columns (GE Healthcare). 
E. coli cells were disrupted by sonication in loading buffer (20mM Tris (pH 
8.0), 500mM NaCl, 20mM imidazole). After centrifugation, the supernatant 
was loaded onto a nickel column and was washed extensively with loading buffer. 
The target protein was eluted with a linear gradient of 50-500 mM imidazole. The 
target protein was collected and dialysed overnight in loading buffer, with Ulp1 
protease added to cleave the 6 X His-sumo tag. Digested protein was loaded onto a 
nickel-column again to remove the 6X His-sumo tag and the His-tagged protease. 
The flow through containing TRIM24 PHD-Bromo was collected and further 
purified by a Hiload 75 26/60 column in buffer containing 10 mM Tris (pH 8.0), 
100 mM NaCl and 5 mM DTT. The major peak was pooled, concentrated to high 
concentration and stored in a —80 °C freezer. 
Crystallization. Crystals of TRIM24 PHD-Bromo in the free form were grown by 
mixing equal volumes of 20 mg ml’ protein and crystallization buffer (100 mM 
Hepes (pH 7.5), 2.0 M ammonium sulphate and 2% polyethylene glycol 400) at 
20 °C. Crystals appeared after two days and reached their full lengths within a 
week. 

Crystals of the complex of TRIM24 PHD-Bromo and unmodified H3(1-10)K4 
peptide were grown by incubating 20mg ml ' protein with peptide at a 1:1.5 
ratio, followed by mixing with an equal volume of crystallization buffer (50 mM 
Tris (pH 7.5), 30% polyethylene glycol monomethyl ether 5000, and 100 mM 
NaCl) at 20 °C. Crystals appeared within a week and reached their full size in two 
weeks. 

Crystals of the complex of TRIM24 PHD-Bromo and H3(13-32)K23ac peptide 
were grown by mixing 20 mg ml ' protein with peptide at 1:1.5 ratio, followed by 
mixing with an equal volume of crystallization buffer (100 mM sodium citrate 
(pH 5.6), 200 mM potassium/sodium tartrate tetrahydrate and 1.6 M ammonium 
sulphate) at 20°C. 

Crystals of the complex of TRIM24 PHD-Bromo and H3(23-31)K27ac peptide 
were grown by mixing 20 mg ml protein with peptide at a 1:3 ratio, followed by 
mixing with an equal volume of crystallization buffer (50 mM Tris (pH 8.0), 
200mM sodium acetate, 30% polyethylene glycol 4000) at 20°C. Initially, we 
could only get twinned crystals that could not be used for structure determina- 
tion. Later, single crystals were obtained by including detergent 0.8 mM CHAPSO 
(Hampton Research) into the crystallization buffer. 

Crystals of the complex of TRIM24 PHD-Bromo and H4(14-19)K16ac peptide 
were prepared by mixing 20 mg ml ' protein with peptide at a 1:3 ratio, followed 
by mixing with an equal volume of crystallization buffer (50 mM Bis-Tris (pH 
6.5), 30% polyethylene glycol 3350, 100mM ammonium acetate) at 20°C. 
Crystals appeared within a day and reached their full lengths in a week. Fresh 
crystals (grown within a week) were used for data collection, as crystals grown for 
longer than 3 weeks diffracted poorly. For all the crystals mentioned above, crys- 
tallization buffers with 20% glycerol were used as cryoprotectant. 

Data collection and structure determination. Data sets for crystals of TRIM24 
PHD-Bromo in the free state, as well as TRIM24 PHD-Bromo complexes with 
unmodified H3(1-10)K4, with H3(13-32)K23ac and with H4(14-19)K16ac pep- 
tides were collected at NE-CAT beamline 24ID-E, Advanced Photon Source, 
Chicago. Data sets for crystals of the complex of TRIM24 PHD-Bromo and 
H3(23-31)K27ac peptide were collected at beamline X29 at the National 
Synchrotron Light Source (NSLS, Brookhaven National Laboratories). All the 
data sets were integrated and scaled with the program HKL2000 suite. The 
free-form TRIM24 PHD-Bromo crystal belongs to P1 space group and contains 
four molecules per asymmetric unit. The structure was solved by the molecular 
replacement method using the PHASER program”', using the BHC80 PHD finger 
(PDB coordinate 2PUY) and TRIM24 bromodomain (PDB coordinate 2YYN) as 
search models, while searching for four copies of each domain in the asymmetric 
unit. The initial model was rebuilt with COOT”, cycled with structure refinement 
by CNS”. The structures of all the complexes were solved by molecular replacement 


with free-form TRIM24 PHD-Bromo as the model. All the crystallographic stat- 
istics are listed in Supplementary Table 1. 

Peptide synthesis and fluorescence polarization based measurement. H3 pep- 
tides (residues 1-33) used for isothermal titration calorimetry (ITC) and for fluor- 
escence polarization (FP) measurements were synthesized by FMOC strategy 
using pseudo prolines at critical positions. Details of peptide synthesis are available 
on request. For FP assays, peptides were labelled using fluorescein-NHS ester 
(Invitrogen). Labelled peptides were purified on G10 gel filtration resin (GE 
Healthcare) and by RP-C18 HPLC. Single labelled species were identified by mass 
spectrometry. FP assays were essentially carried out and analysed as described 
using FP buffer (10mM Tris-HCl, 100mM NaCl, 5mM DTT, pH 7.4)™. 
Titration series of 10 pl volume in 384-well plates were read multiple times on a 
Plate Chameleon II plate reader (HIDEX Oy). Multiple readings and independent 
titration series were averaged after data normalization. 

ITC measurements. Calorimetric experiments were conducted at 25°C with a 
MicroCal iTC200 instrument. Recombinant wild-type TRIM24 PHD-Bromo and 
its mutants were dialysed overnight against 20 mM Tris (pH 7.5), 50mM NaCl 
and 2 mM f-mercaptoethanol. Aliquots of lyophilized peptides were dissolved in 
the same buffer before use. Calorimetric titration was performed by injecting 
synthetic peptide into wild-type TRIM24 PHD-Bromo or its mutants at various 
concentrations. Calorimetric titration data were fitted using Origin 7.0 software 
on the basis of a 1:1 binding stoichiometry. 

Peptides used for the crystallization and binding assays. H3(1-10)K4, 
ARTKQTARKS; H3(1-21)K4, ARTKQTARKSTGGKAPRKQLAGGK-Biotin; 
H3(1-15)K4, ARTKQTARKSTGGKAY; H3(1-15)K4me1, ARTK(me1)QTARK 
STGGKAY; H3(1-15)K4me2, ARTK(me2)QTARKSTGGKAY; H3(1-15)K4me3, 
ARTK(me3)QTARKSTGGKAY; H3(1-19)K14ac, ARTKQTARKSTGGK(ac)A 
PRKQ; H3(1-20)K9ac, ARTKQTARK(ac)STGGKAPRKQL; H3(23-31)K27ac, 
KAARK(ac)SAPA; H3(13-32)K27ac, GKAPRKQLATKAARK(ac)SAPATYK- 
Biotin; H3(13-32)K23ac, GKAPRKQLATK(ac)AARKSAPATYK- Biotin; H3(1- 
33)K4K23ac-Bio, ARTKQTARKSTGGKAPRKQLATK(ac)AARKSAPATGYK- 
Biotin; H3(1-33)K4me3K23ac-Bio, ARTK(me3)QTARKSTGGKAPRKQLAT 
K(ac)AARKSAPATGYK-Biotin; H3(1-33)K4-Bio, ARTKQTARKSTGGKAPR 
KQLATKAARKSAPATGYK-Biotin; H3(1-33)K4, ARTKQTARKSTGGKAPRK 
QLATKAARKSAPATG; H3K4K23ac, | ARTKQTARKSTGGKAPRKQLAT 
K(ac)AARKSAPATG; H3K4me3K23ac, ARTK(me3)QTARKSTGGKAPRKQL 
ATK(ac)AARKSAPATG; H4(14-19)Kl6ac, GAK(ac)RHR; H4(1-20)K16ac, 
SGRGKGGKGLGKGGAK(ac)RHRK; H4(1-21), SGRGKGGKGLGKGGAKRHR 
KVGGK-Biotin; H4(1-27)K5ac, SGRGK(ac)GGKGLGKGGAKRHRKVLRDNI 
Q-PEG-Biotin; H4(1-27)K8ac, SGRGKGGK(ac)GLGKGGAKRHRKVLRDNIQ- 
PEG-Biotin; H4(1-27)K12ac, SGRGKGGKGLGK(ac)GGAKRHRKVLRDNIQ- 
PEG-Biotin; H4(1-27)Kléac, SGRGKGGKGLGKGGAK(ac)RHRKVLRDNIQ- 
PEG-Biotin; H4(1-27)K20ac, SGRGKGGKGLGKGGAKRHRK(ac)VLRDNIQ- 
PEG-Biotin. 
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The adipose-derived hormone leptin maintains energy balance in part through central nervous system-mediated 
increases in sympathetic outflow that enhance fat burning. Triggering of B-adrenergic receptors in adipocytes 
stimulates energy expenditure by cyclic AMP (cAMP)-dependent increases in lipolysis and fatty-acid oxidation. 
Although the mechanism is unclear, catecholamine signalling is thought to be disrupted in obesity, leading to the 
development of insulin resistance. Here we show that the cAMP response element binding (CREB) coactivator Crtc3 
promotes obesity by attenuating p-adrenergic receptor signalling in adipose tissue. Crtc3 was activated in response to 
catecholamine signals, when it reduced adenyl cyclase activity by upregulating the expression of Rgs2, a 
GTPase-activating protein that also inhibits adenyl cyclase activity. As a common human CRTC3 variant with 
increased transcriptional activity is associated with adiposity in two distinct Mexican-American cohorts, these 
results suggest that adipocyte CRTC3 may play a role in the development of obesity in humans. 


Obesity is a major risk factor in the development of insulin resistance, 
which is characterized by decreased glucose uptake into muscle and 
increased glucose production by the liver. Obesity affects one-third of 
adults in the USA’; the prevalence of obesity appears even higher in 
certain ethnic groups, although relevant predisposing factors have not 
been fully identified. Obesity is a particular problem among Mexican- 
Americans, with an overall prevalence of 40% (36% in men, 45% in 
women)', contributing to elevated rates of diabetes”. Environment, 
lifestyle and genetic susceptibility probably contribute to the increased 
risk of obesity and diabetes in this population. 

Under lean conditions, the adipose-derived hormone leptin is 
thought to promote energy expenditure through increases in sym- 
pathetic nerve activity that enhance catecholamine signalling in white 
adipose tissue (WAT) and brown adipose tissue (BAT)**. Triggering 
of B-adrenergic receptors appears important for subsequent increases 
in lipolysis and fatty-acid oxidation*; mice with knockouts of all three 
B receptors have reduced energy expenditure, and they are more 
susceptible to effects of high-fat diet (HFD) feeding on weight gain. 
Conversely, transgenic over-expression of B-adrenergic receptor 1 in 
adipose tissue appears sufficient to confer resistance to obesity®. 

Triggering of B-adrenergic receptors stimulates cAMP-mediated 
increases in cellular gene expression with burst-attenuation kinetics”*; 
rates of transcription peak within 1h of stimulation and decrease 
thereafter even under continuous stimulation. Although the under- 
lying mechanisms remain unclear, the attenuation of cellular genes is 
thought to be coordinated by negative feedback effectors, which are 
themselves targets for upregulation by cAMP’. 

cAMP stimulates the expression of cellular genes through the protein 
kinase A (PKA)-mediated phosphorylation of CREB family members 
(CREB1, ATF1, CREM), a modification that promotes recruitment of 
the histone acetyl transferase paralogues P300 and CBP'*”. 

In parallel, cAMP also increases gene expression by stimulating the 
CREB regulated transcriptional coactivators (CRTCs)'*"*. Under basal 


conditions, CRTCs are sequestered in the cytoplasm through phos- 
phorylation-dependent interactions with 14-3-3 proteins. CRTCs are 
phosphorylated by salt-inducible kinases and other members of the 
stress- and energy-sensing AMPK family of Ser/Thr kinases. Increases 
in intracellular cAMP signalling promote the PKA-mediated phos- 
phorylation and inhibition of salt-inducible kinase activity, leading 
to the subsequent dephosphorylation and nuclear entry of CRTCs, 
which bind to CREB over relevant promoters. After prolonged 
stimulation with cAMP agonist, CRTC activity is terminated through 
ubiquitin-mediated degradation”. 

The CRTC family consists of three members (Crtcl, Crtc2 and Crtc3), 
which are distinguished in part by their expression profiles. Crtcl is 
produced primarily in brain, where it mediates leptin effects on satiety’®; 
mice with a knockout of the Crtcl gene develop obesity due in part to 
reductions in energy expenditure. By contrast, Crtc2 is expressed at high 
levels in liver where it promotes fasting gluconeogenesis'”"*; mice with a 
knockout of Crtc2 appear more insulin sensitive under HFD conditions, 
owing to reductions in hepatic glucose output”. 


Role of CRTC3 as a CREB coactivator 


Similar to other CRTC family members, Crtc3 contains CREB binding 
(CBD; amino acids 1-50), regulatory (RD; amino acids 51-549) and 
trans-activation domains (TAD; amino acids 550-619), which are also 
present in Crtcl and Crtc2 (Fig. 1a). In the basal state, Crtc3 is phos- 
phorylated at Ser 162 by salt-inducible kinases and other members of 
the stress- and energy-sensing AMPK family of Ser/Thr kinases’*°”". 
Short-term (0.5-1h) exposure to cAMP agonist promotes the depho- 
sphorylation and nuclear entry of Crtc3 (Fig. 1a); similar to Crtc2", 
prolonged cAMP stimulation triggers Crtc3 degradation. 

Crtc3 over-expression augments the activity of a cAMP responsive 
(CRE-luc) reporter in cells exposed to forskolin (FSK; Fig. 1b); and 
mutation of the regulatory Ser162 phosphorylation site to alanine fur- 
ther enhances Crtc3 activity under basal conditions. In keeping with the 
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Figure 1 | Crtc3”'~ mice are resistant to obesity. a, Top, CREB-binding 
(CBD), regulatory (RD) and transactivation (TAD) domains and conserved 
AMPK/salt-inducible kinase phosphorylation site (Ser 162). Consensus 
phosphorylation site for AMPK family members (@xBTSxxx@) shown; relative 
position of hydrophobic (@), basic (B), Thr (T), and phosphorylated Ser (S) 
residues indicated. x represents any amino acid. Middle, immunoblot of Crtc3 
in wild-type (WT) and Crtc3 ~~ (knockout, KO) MEBs exposed to FSK. 
Bottom, effect of FSK on nuclear and cytoplasmic Crtc3 levels. b, Effect of wild- 
type or $162A Crtc3 on CRE-luciferase activity. c, Quantitative PCR (top) and 


proposed role of CREB in recruiting CRTC3 to relevant promoters, 
expression of a dominant negative CREB inhibitor, called ACREB”, 
blocks Crtc3 effects on reporter activity in cells exposed to FSK. By 
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immunoblot (bottom) analysis of Crtc3 tissue expression. BAT, brown adipose; 
CBM, cerebellum; CTX, cortex; LIV, liver; MUS, skeletal muscle; PAN, 
pancreas; WAT, white adipose. d, Top, Crtc3 targeting vector with Neo 
selection marker replacing Exon 1, which encodes the CBD. Bottom, PCR 
analysis of wild-type and mutant Crtc3 alleles in mice. e, Weight gain in wild- 
type and Crtc3 mutant mice maintained on normal chow (n=8 per group) or 
HED (n=5) (*P<0.05; **P<0.01; ***P<0.001.). f, Fat mass (left) and 
photograph (right) of HED-fed wild-type and Crtc3'~ mice (n=4 per group) 
(*P<0.05). Error bars, s.e.m. 


contrast with Crtcl, which is expressed primarily in brain, Crtc3 protein 
and messenger RNA (mRNA) amounts are particularly abundant in 
WAT and to a lesser extent in BAT (Supplementary Fig. 1 and Fig. 1c). 
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Based on the importance of the CBD for Crtc-mediated induction 
of cAMP-responsive genes”**4, we generated Crtc3'~ mice with a 
deletion of exon 1, which encodes the CBD (Fig. 1d). Crtc3"'~ mice 
are born at the expected Mendelian frequency; they appear com- 
parable to wild-type littermates at birth, despite the absence of detect- 
able Crtc3 mRNA and protein amounts in all tissues (Fig. 1c). 
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Role of CRTC3 in energy balance 


When maintained on a normal chow diet, Crtc3‘~ mice appear more 
insulin sensitive than controls by insulin tolerance testing (Sup- 
plementary Fig. 1, right). Crtc3"‘~ animals also have 50% lower adipose 
tissue mass, despite comparable food intake and physical activity to 
control mice (Supplementary Fig. 2). 

When transferred to an HED (60% of calories from fat), Crtc3'~ mice 
gained 35% less weight relative to controls reflecting primarily differences 
in fat accumulation (Fig. le, f). The effect of Crtc3 on adiposity appeared 
to be dependent on gene dosage as Crtc3*/~ mice show intermediate 
weight gains relative to wild-type and Crtc3’~ mice. Although physical 
activity and food intake were nearly identical, energy expenditure and 
oxygen consumption were substantially elevated in HFD-fed Crtc3/~ 
mice relative to wild-type littermates (Fig. 2a, b). Pointing to parallel 
increases in glucose and lipid oxidation, respiratory quotients were com- 
parable in wild-type and Crtc3'~ mice (Supplementary Fig.3). 

Circulating concentrations of free fatty acids were decreased in 
Crtc3/~ mice, and they were protected from the effects of HFD 
feeding on hepatic steatosis (Fig. 2c). Consistent with their reduced 
fat mass, Crtc3-/~ mice had decreased circulating leptin concentra- 
tions compared with wild-type littermates, although the reduction in 
leptin levels (tenfold) appeared disproportionately low relative to the 
difference in fat mass (threefold) (Fig. 2d and Supplementary Fig. 4). 
Indeed, intraperitoneal administration of leptin stimulated energy 
expenditure to a greater extent in Crtc3 mutant than wild-type mice. 
Taken together, these results indicate that disruption of Crtc3 activity 
leads to increases in energy expenditure, which maintain leptin sensi- 
tivity and protect against ectopic lipid accumulation. 

Under obese conditions, increases in inflammatory infiltrates in 
adipose tissue contribute to the development of systemic insulin res- 
istance**. Although they were readily observed in wild-type mice, 
adipose-tissue macrophages were less abundant in Crtc3 ’~ tissue 
(Fig. 2e and Supplementary Fig. 5). Arguing against an effect of the 
Crtc3 knockout on macrophage function per se, tumour necrosis 
factor-a release from peritoneal macrophages in response to lipopo- 
lysaccharide appeared comparable between Crtc3 mutant and control 
cells (Supplementary Fig. 5). In line with these differences, circulating 
insulin concentrations were lower in HFD-fed Crtc3"‘~ than wild- 
type mice, and whole-body insulin sensitivity was correspondingly 
improved by insulin and glucose tolerance testing (Fig. 2f). Asa result, 
glucose uptake into muscle was increased in Crtc3 ‘~ mice compared 
with control littermates (Supplementary Fig. 6). 

We considered that Crtc3 activity in adipose tissue may be modu- 
lated by hormonal signals. In line with its effects on Crtc3 depho- 
sphorylation in cell cultures (Supplementary Fig. 7), intraperitoneal 
administration of B-adrenergic agonist isoproterenol (ISO) increased 
the activity of a CRE-luc reporter transgene in WAT and BAT by live 
imaging analysis (Fig. 3a). Leptin administration (intraperitoneal) 
also promoted Crtc3 dephosphorylation. Crtc3 protein amounts in 
WAT are elevated under ad libitum conditions; they decreased after 
fasting for 6 , when Crtc3 appeared to undergo degradation (Fig. 3a). 
Consistent with an increase in protein stability under obese condi- 
tions, Creb and Crtc3 protein amounts were upregulated in WAT 
from HFD-fed mice compared with those fed on normal chow 
(Supplementary Fig. 8). 
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Figure 2 | Increased energy expenditure in Crtc3~’~ mice. a, b, Energy 
expenditure and oxygen consumption (a) as well as food intake and physical 
activity (b) in HFD-fed mice (n=4 per group). c, Free-fatty-acid levels (top) and 
haematoxylin and eosin sections of livers (bottom) in HFD-fed mice (n=3 per 
group) (**P<0.01). d, Leptin levels (top) (n=5 per group) and effect of 
intraperitoneal leptin administration on energy expenditure (bottom) (n=4 
per group) (*P<0.05; ***P<0.001). e, Macrophage infiltration (top) and gene 
expression (bottom) in WAT from HFD-fed mice. Scale bar, 50 tm. f, Insulin 
levels (top), insulin tolerance testing (middle) and glucose tolerance testing 
(bottom) of HFD-fed mice (n=5 per group) (*P<0.05; **P<0.01; 
***D<().001). Error bars, s.e.m. 
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Figure 3 | Increased catecholamine signalling in Crtc3~'~ adipose tissue. 
a, Top, effect of ISO on CRE-luc reporter activity in different tissues. Bottom, 
immunoblots of Crtc3 in WAT (left) or BAT (right). Effect of fasting (Fast), 
feeding ad libitum (Ad lib.) and leptin administration indicated. LIV, liver. 

b, Haematoxylin and eosin sections (top) and adipocyte size distribution 
(bottom) in WAT from wild-type and Crtc3—'~ mice. Scale bar, 50 um. 

c, Lipolysis rates in adipocytes exposed to ISO (left) or FSK (right) (n=3) 
(*P<0.05; **P<0.01).Veh, vehicle. d, Phospho- (Ser 660) HSL levels in WAT 


Catecholamine signalling in adipose tissue 

Under HFD feeding conditions, increases in catecholamine signalling 
maintain energy balance by mobilizing triglyceride stores in WAT”. 
Although the total number of adipocytes in WAT fat pads was nearly 
identical in both groups, adipocytes from Crtc3-‘~ mice were sub- 
stantially smaller than from wild-type mice (Fig. 3b). Arguing against 
a disruption in triglyceride synthesis, mRNA amounts for lipogenic 
genes (Acc, Lpl, Scd) appeared comparable between Crtc3 mutant and 
wild-type adipocytes (Supplementary Fig. 9). Rather, basal and ISO- 
induced lipolysis rates were increased in Crtc3 ‘~ compared with 
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from mice fed on normal chow or HFD after ISO injection (top) and from 
HED-fed wild-type or Crtc3~’~ mice (bottom left). Bottom right, immunoblot 
of PKA activity in WAT from HFD-fed mice. e, Haematoxylin and eosin 
sections (top) and brown adipocyte numbers (bottom) in wild-type and 
Crtc3-'~ BAT. Scale bar, 50 pum (***P<0.001). f, Top, fatty-acid oxidation 
(FAO) and Ucp1 mRNA levels in brown adipocytes. Core body temperatures 
indicated (n=4 per group) (**P<0.01). Error bars, s.e.m. F/I, exposure to 
forskolin plus isoproterenol. 


control adipocytes (Fig. 3c). Exposure to FSK also increased lipolysis 
to a greater extent in Crtc3 '~ adipocytes (Fig. 3c), pointing to the 
potential upregulation of the cAMP signalling pathway in these cells. 

Triggering of §-adrenergic receptors has been found to promote 
lipolysis through the cAMP-dependent PKA-mediated phosphoryla- 
tion of hormone sensitive lipase (HSL)’’. In keeping with the pro- 
posed downregulation of B-adrenergic receptor signalling in obesity, 
administration of ISO had only modest effects on HSL phosphoryla- 
tion in HFD-fed relative to animals fed on normal chow (Fig. 3d). 
Indeed, amounts of phospho- (Ser 660) HSL were substantially elevated 
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in Crtc3-'" WAT compared with wild type, even though circulating 
concentrations of noradrenaline and adrenaline were similar between 
the two groups (Fig 3d and Supplementary Fig. 10). PKA activity in 
WAT wasalso increased in Crtc3'~ mice by immunoblot assay using a 
phospho-specific PKA substrate antiserum (Fig. 3d). Consistent with 
the predominant expression of Crtc3 in adipose tissue, PKA activity in 
other tissues appeared similar between wild-type and Crtc3"'~ mice 
(Supplementary Fig. 11). 
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Having seen that lipolysis rates are increased in WAT, and realizing 
that circulating free-fatty-acid concentrations are reduced in Crtc3 
mutant mice, we considered that fatty-acid oxidation should also be 
upregulated in this setting. Under HFD conditions, leptin has been 
proposed to trigger catecholamine-mediated increases in fat burning in 
BAT, a process known as diet-induced thermogenesis*”’. In keeping 
with the ability for catecholamines to stimulate BAT expansion, brown 
adipocyte numbers were increased twofold in intra-scapular fat pads 
from Crtc3-‘~ mice compared with controls (Fig. 3e). Suggesting a 
parallel increase in fat burning, Crtc3”'~ brown adipocytes also had 
smaller intracellular lipid vacuoles than wild-type cells. Moreover, 
fatty-acid oxidation rates were increased in primary brown adipocytes 
from Crtc3 ‘~ mice relative to controls, and uncoupling protein 1 
(Ucp1) mRNA amounts were also higher (Fig. 3f). Correspondingly, 
core body temperatures were elevated in Crtc3'~ mice compared with 
control animals. Taken together, these results indicate that loss of 
Crtc3 increases fat burning in part through increases in brown adipo- 
cyte numbers in BAT. 

We reasoned that the loss of Crtc3 expression could increase cellular 
PKA activity by altering the subunit composition of the PKA holoen- 
zyme. Over-expression of the forkhead protein Foxc2 in WAT, for 
example, has been found to promote energy expenditure by upregulat- 
ing mRNA amounts for regulatory subunit I, which has a higher affinity 
for cAMP than RII”®. Arguing against this possibility, however, mRNA 
amounts for regulatory subunits I and II in WAT were comparable 
between wild type and Crtc3 mutants (Supplementary Fig. 12, left). 


cAMP accumulation in CRTC3 mutant cells 


Alternatively, disruption of the Crtc3 gene may enhance PKA activity 
by increasing cellular cAMP accumulation in response to hormonal 
signals. Supporting this idea, cAMP concentrations were elevated in 
WAT from Crtc3 mutant mice relative to controls (Fig. 4a). Exposure 
to FSK also triggered cAMP accumulation to a greater extent in 
Crtc3-/~ mouse embryonic fibroblasts (MEFs) and in Crtc3/~ 
BAT stromal-vascular cells compared with wild-type cells (Fig. 4a 
and Supplementary Fig. 12, right). Moreover, Crtc3 over-expression 
in wild-type cells reduced cAMP production in response to FSK, 
whereas acute RNA interference (RNAi)-mediated depletion of 
Crtc3 increased it (Fig. 4b and Supplementary Fig. 13). 

In principle, the enhanced accumulation of cAMP in Crtc3-deficient 
cells could reflect a decrease in cellular phosphodiesterase activity. In 
that event, treatment with non-selective phosphodiesterase inhibitor 
should lead to comparable increases in cAMP concentrations between 
wild-type and mutant cells exposed to B-adrenergic agonist. However, 
intracellular cAMP concentrations remained higher in Crtc3-/~ than 
wild-type cells after co-stimulation with ISO plus isobutyl-methyl 
xanthine (Supplementary Fig. 14). Based on these results, we reasoned 
that Crtc3 probably inhibits cAMP signalling by modulating cellular 
adenyl cyclase activity. 

In gene profiling studies to identify cellular genes that mediate 
inhibitory effects of Crtc3 on cAMP signalling, we identified the 
Regulator of G protein signalling 2 (Rgs2) as the most highly upregu- 
lated gene of 15 that are induced threefold or better in adipocytes 
exposed to FSK*'. We confirmed these effects in cultured primary 


Figure 4 | Crtc3 attenuates adipose tissue cAMP signalling. a, cAMP 
content in WAT (top) and MEFs (bottom) from wild-type and Crtc3~'~ mice 
(**P<0.01). b, cAMP accumulation in MEFs over-expressing (top) or depleted 
of (bottom) Crtc3. (***P<0.001). ¢, Rgs2 mRNA levels in cultured adipocytes 
exposed to FSK (top) and in WAT from mice fed on normal chow or HFD 
(bottom) (*P<0.05; **P<0.01). d, Effect of Rgs2 over-expression (top) or Rgs2 
RNAi (bottom) on cAMP accumulation in MEFs (*P<0.05; ***P<0.001). 

e, Top, effect of Crtc3 over-expression or Crtc3 RNAi on RGS2-luc reporter 
activity. Bottom, chromatin immunoprecipitation assay of Crtc3 occupancy 
over the Rgs2 promoter. f, Top, relative nuclear/cytoplasmic fractionation (top) 
and activities (bottom) of over-expressed wild-type and S72N CRTC3 in 
HEK293T cells. Error bars, s.e.m. 
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adipocytes, where exposure to FSK increased the expression of Rgs2 
and other targets in wild-type cells, but to a much lesser extent in 
Crtc3~'~ cells (Fig. 4c, top, and Supplementary Fig. 15). By contrast 
with its reduction in adipose tissue from Crtc3~'~ mice, Creb target 
gene expression in skeletal muscle appeared comparable between 
wild-type and Crtc3 mutant animals, probably reflecting regulatory 
contributions from Crtc2, which has been shown to promote the 
expression of PGC-1« and mitochondrial genes in muscle cells”. 

First identified as a GTPase activating protein that blocks Gq sig- 
nalling, Rgs2 has also been shown to inhibit the cAMP pathway by 
binding directly to a subset of adenyl cyclases (types III, V and VI)’, 
isoforms that are enriched in BAT and WAT**°”. Moreover, muta- 
tions that enhance RGS2 expression have been associated with 
increased risk of metabolic syndrome in humans”. In keeping with 
the upregulation of Crtc3 in obesity, HFD feeding stimulated Rgs2 
mRNA amounts in wild-type WAT, but Rgs2 expression remained 
low in WAT from HED-fed Crtc3~'~ mice (Fig. 4c, bottom). 
Consistent with its proposed role as an adenyl cyclase inhibitor, 
Rgs2 over-expression reduced cAMP production in cells exposed to 
FSK, whereas RNAi-mediated knockdown of Rgs2 increased it 
(Fig. 4d and Supplementary Figs 16 and 17). 

We examined whether the Rgs2 gene is a direct target of Creb and 
Crtc3. In line with the presence of conserved CREB binding sites at — 184 
and —66 on the Rgs2 promoter, exposure to FSK upregulated RGS2- 
luciferase reporter activity in transient transfection assays (Fig. 4e). Crtc3 
over-expression further enhanced RGS2 promoter activity, whereas 
RNAi-mediated depletion of Crtc3 reduced it. Consistent with a direct 
effect of these activators, exposure to FSK increased Creb and Crtc3 
occupancy over the RGS2 promoter in wild-type cells (Fig. 4e). 


Role of CRTC3 in human obesity 


Having seen the effects of Crtc3 on energy expenditure, we wondered 
whether this coactivator also contributes to obesity in humans. Within 
the human database of single nucleotide polymorphisms (dbSNP), we 
noticed a common CRTC3 variant allele, which encodes a missense 
variant (S72N) near a predicted nuclear export sequence’. Supporting 
this idea, nuclear amounts of 72N CRTC3 were elevated relative to 72S 
CRTC3 under basal conditions (Fig. 4f). Correspondingly, 72N variant 
CRTC3 was more potent than 72S CRTC3 in stimulating RGS2 pro- 
moter activity, particularly under basal conditions (Fig. 4f). 

We examined the potential association between the S72N variant 
CRTC3 and adiposity in a Mexican-American cohort of 779 indivi- 
duals (Table 1). The allele frequency of the 72N variant in this popu- 
lation was 34%. In keeping with its increased activity relative to 
wild-type CRTC3, the 72N allele was also associated with several 
anthropometric indices of adiposity including weight, body mass 
index (BMI) and hip circumference. Similar to the gene dosage effect 
of CRTC3 on weight gain in mice, Mexican-Americans with two 72N 
alleles had increased adiposity compared with those with only one 
variant allele; and 72S/72N heterozygous individuals had intermediate 
adiposity relative to individuals homozygous for the wild-type and 
variant CRTC3 alleles. 

We then sought to confirm the association of 72N with increases in 
adiposity indices, by assessing the association of a perfect proxy SNP 


Table 1 | Association of S72N with anthropometric indices in the 
MACAD cohort 


S/S (n=346) S/N (n=338) ~~ N/N (n=95) P value 
Weight (kg) 73.3 (19.6) 75.1 (20.6) 76.0(22.2) 0.033* 
BMI (kg/m?) 28.1 (6.1) 28.3 (6.1) 29.1 (5.6)  0.038* 
Hip circumference (cm) 103.0(12.3) 104.5 (13.0) 104.5 (12.8) 0.033* 
Waist (cm) 90.5 (15) 93.0 (16) 93.3 (17.3) 0.15 
BSA (m?) 1.77 (0.28)  1.81(0.30) 1.84(0.30) 0.075 
Waist/hip ratio 0.88 (0.10) 0.89(0.11) 088(0.12) 0.43 


Values are median (interquartile range). S/S, individuals homozygous for wild-type CRTC3 (Ser 72); 
S/N, individuals heterozygous for variant CRTC3 (Asn 72); N/N, individuals homozygous for variant 
(Asn 72) CRTC3. 

* Significant P values. 
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183862434 (r=1 with S72N) with adiposity measures in 987 
Mexican-Americans from the Multi-Ethnic Study of Atherosclerosis 
(MESA). The minor allele of rs3862434 (G, frequency 34%), which 
corresponds to 72N, was associated with increased body surface area 
(BSA) and a trend to increased weight (Supplementary Table 1). In an 
analysis combining the two cohorts, 72N exhibited associations with 
weight, hip circumference, BMI and BSA (Supplementary Table 2). 
Because these traits are interrelated, we did not use multiple testing 
correction, which would be overly conservative because our other data 
suggested 72N altered the biological function of CRTC3. Taken 
together, these results indicate that CRTC3 is associated with increased 
obesity risk in Mexican-Americans. 

This association did not extend to 2,527 non-Hispanic whites from 
MESA, in whom we observed no association of rs3862434 with any of 
the six available obesity traits (Supplementary Table 3). Of note, the G 
allele of rs3862434 (proxy for the adiposity-associated 72N allele) is 
the minor allele in Mexican- Americans and the major allele in non- 
Hispanic whites. We then conducted a z-score meta-analysis in over 
63,000 subjects from the Cohorts for Heart and Aging Research in 
Genome Epidemiology (CHARGE) and Genetic Investigation of 
ANthropometric Traits (GIANT) consortia, examining the asso- 
ciation of $72N with BMI. In this meta-analysis, the 72N allele was 
associated with increased BMI; however, this did not reach statistical 
significance (z-score 0.74, P=0.45). This suggests that the effect of 
72N to promote obesity, although substantial in Mexican-Americans, 
is very weak or non-existent in non-Hispanic whites. As the 72N allele 
is more frequent in non-Hispanic whites, and yet has a minimal effect 
on BMI in this population, our results are consistent with the obser- 
vation that obesity is less frequent in non-Hispanic whites than 
Mexican-Americans’. The weaker effect of 72N in non-Hispanic 
whites may be due to environmental/lifestyle factors and/or differ- 
ences in genetic background. 


Discussion 


HED feeding has been shown to promote obesity and insulin resistance 
through increases in energy intake that lead to the ectopic deposition of 
lipid in liver. Our results suggest that Crtc3 contributes to these 
changes in part by attenuating catecholamine signalling in adipose 
tissue (Supplementary Fig. 18). 

Although the proposed role of Crtc3 as a negative feedback regulator 
of adipocyte cAMP signalling was unexpected, we note that intracellular 
signalling pathways often self-attenuate as part of a homeostatic mech- 
anism to limit cellular responses to hormonal stimuli’. Thus the chronic 
upregulation of sympathetic nerve activity under HFD conditions”® may 
attenuate the intracellular cAMP pathway through the Crtc3-mediated 
induction of Rgs2. By limiting catecholamine-dependent increases in 
lipolysis and fatty-acid oxidation, Crtc3 may also function as a so-called 
‘thrifty’ gene that enhances survival under starvation conditions. Future 
studies may reveal the extent to which CRTC3 also contributes to the 
development of insulin resistance and type II diabetes. 


METHODS SUMMARY 


Crtc3 '‘~ mice were generated by targeted disruption of exon 1 in the Crtc3/~ 
gene, which encodes the CREB binding domain. Transgenic CRE-luciferase 
reporter mice were generated and analysed by in vivo imaging with an IVIS- 
100 instrument (Caliper Life Sciences). For indirect calorimetry studies, mice 
were housed individually; oxygen consumption and energy expenditure were 
measured using a LabMaster system (TSE Systems). Circulating leptin, insulin 
and free-fatty-acid concentrations were measured by enzyme-linked immuno- 
sorbent assay (ELISA). In vitro lipolysis and fatty oxidation rates were evaluated 
on cultures of primary white and brown adipocytes as described*’°; glucose 
uptake was measured in soleus muscle using [U-'*C]2-deoxyglucose. PKA activ- 
ity was examined in different tissues by immunoblot assay with phospho-PKA 
substrate antiserum (Cell Signaling). Intracellular cAMP accumulation in wild- 
type and CRTC3 mutant cells was measured by ELISA. Crtc3 and Creb occupan- 
cies over cellular genes were evaluated by chromatin immunoprecipitation assay". 
For studies of human genetic association, that of CRTC3 variant (S72N) with 
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adiposity parameters was assessed in participants of the Cedars-Sinai/University of 
California, Los Angeles, Mexican-American Coronary Artery Disease (MACAD) 
study, comprising 779 subjects‘. Confirmatory studies were performed on 
Mexican-American subjects from the Multi-Ethnic Study of Atherosclerosis 
(MESA), containing 987 Mexican-American individuals. Genotyping of SNP 
188033595 (S72N) in MACAD was performed using TaqManMGB technology”. 
This SNP was also explored for association with body mass index in the CHARGE 
and GIANT consortia. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Animal studies. Mice were housed in a temperature-controlled environment 
under a 12h light:dark cycle with free access to water and standard rodent chow 
diet (LabDiet 5001). For HED studies, 6- to 8-week-old mice were transferred to a 
60% HED (Research Diets, D12492). Magnetic resonance imaging scans for fat 
and lean mass were performed using an Echo MRI-100 instrument according to 
the manufacturer’s instructions. ISO (10 mgkg_') and leptin (3 mgkg '; once a 
day for 3d) were administered intraperitoneally. All animal procedures were 
performed with an approved protocol from the Salk Institute Animal Care and 
Use Committee. 

Crtc3~'~ mice. The targeting vector was constructed by replacing exon 1 of the 
Crtc3 gene, which encodes the CREB binding domain, with a phosphoglycerol 
kinase-neomycin selection cassette. The vector also contained a phosphoglycerol 
kinase-diphtheria toxin-A cassette for negative selection. The targeting vector 
was linearized and electroporated into Rl embryonic stem cells. G418-resistant 
clones were screened for homologous recombination by Southern blot analysis. 
Crtc3'~ mice were backcrossed to C57/BL6 for up to three generations for 
metabolic studies. 

CRE-luciferase transgenic mice. A cassette of eight tandem full CREB binding 
sites (CRE) in the context of a minimal CFTR promoter (— 126 base pairs of 5’ 
flanking sequence from the human CFTR gene) was amplified by PCR from lysed 
recombinant CRE-luc adenovirus'’. The CRE-luciferase transgene with flanking 
H19 insulator sequences was microinjected into 129/Sv oocytes for implantation 
into pseudo-pregnant female mice. 

Transgenic founders were identified by PCR analysis of genomic DNA. 
Founder lines were backcrossed onto albino C57/BL6 mice (C57BL/6-Tyrc-2], 
Jackson) for three generations. To maintain consistent copy number, hemizygous 
transgenic mice were bred to wild-type mice; the transgene was segregated to 50% 
of offspring in each litter and was stable for at least three generations. In vivo 
luciferase activity was measured by intra-vital imaging. Mice were anaesthetized 
using isofluorane gas (2% to effect), injected with p-luciferin (150 mg kg", intra- 
peritoneally) and imaged in an IVIS-100 instrument (Caliper Life Sciences) for 
1-5 min. 

Genotyping. Genomic DNA was prepared from tail biopsies and genotyped 
using the following primer sets: CRTC3 wild-type allele, CCTGAGTTATT 
GGCGGATGT and CACTCAGGCTGTAGCAAGCA; CRTC3 knockout allele, 
ATGGAAGGATTGGAGCTACG and CACTCAGGCTGTAGCAAGCA; CRE- 
luciferase transgene, GCTGGGCGTTAATCAGAGAG and TTTTCCGTCATC 
GTCTTTCC. 

Histology. Mouse tissues were fixed in zinc-buffered formalin (Anatech) and 
paraffin embedded. Sections (5 tm) were used for haematoxylin and eosin stain- 
ing or immunohistochemistry. For immunohistochemical staining of adipose 
tissue macrophages, rehydrated antigen retrieved sections were incubated with 
F4/80 (Serotec) antiserum and visualized by the avidin—biotin-complex method 
using the chromogen diaminobenzidine (Vector Labs). 

Immunoblot and chromatin immunoprecipitation. Antibodies for CRTC3, 
phospho-HSL and phospho-PKA substrate were obtained from Cell Signaling. 
Immunoblots were performed as previously described’*. Chromatin immuno- 
precipitation studies were performed as previously described”. 

Cell counting. Images of sections (5 tm) stained with haematoxylin and eosin 
were taken at X200 magnification (1,300 pixels X 1,030 pixels per picture). The 
National Institutes of Health Image] program was used to perform cell counts on 
brown adipose tissue sections. 

GTT, ITT. For glucose tolerance testing, mice were fasted for 16 h and then injected 
with glucose (2 ¢kg™ ', intraperitoneally). For insulin tolerance testing, mice were 
fasted 2-3 h and injected with insulin (Humulin; 1 Ukg™ ', intraperitoneally). 
Metabolites. Mouse blood was collected from the tail vein and glucose levels were 
measured with a One Touch Ultra Glucometer (Johnson & Johnson). Circulating 
insulin (Cayman), leptin (Millipore), adrenaline/noradrenaline (LDN GmbH&Co. 
KG) and free fatty acid (WAKO Chemicals) levels were determined by ELISA. 
Core body temperature. Core body temperature was measured with a thermistor 
thermometer (Model 8402-10, Cole Parmer Thermometers). 

cAMP. Tissue and cellular cAMP concentrations were determined using a cAMP 
ELISA kit according to the manufacturer’s instruction (R&D Systems or Cayman 
Chemical Company). Cells were exposed to FSK (10 1M) for times indicated. 
Indirect calorimetry. Mice were individually housed for at least 3 days before 
calorimetry experiments. Food intake, locomotor activity, oxygen consumption 
and carbon dioxide production were simultaneously measured for individually 
housed mice with a LabMaster system (TSE Systems). Data were collected for 2-3 
days and analysed. For leptin studies, mice were treated with saline or leptin 
(3 pgg |, intraperitoneally) 90 min before the onset of the dark cycle. 
Metabolic studies. Rates of fatty-acid oxidation and lipolysis were measured as 
previously described*™°. Basal rates of glucose uptake were determined by measuring 


the rate of [U-'*C]2-deoxyglucose uptake by isolated soleus muscle. Pairs of soleus 
muscle were rapidly dissected from wild-type or CRTC3 /~ mice. Isolated soleus 
strips were incubated twice for 15 min at 37 °C in continuously gassed (95% O2, 5% 
CO,) Krebs-Henseleit bicarbonate buffer containing 5.5mM glucose and 0.5 mM 
oleate complexed to 2% fatty-acid-free bovine serum albumin (Millipore). After the 
30 min pre-incubation period, muscle strips were transferred into fresh identical 
buffer with [U-'*C]2-deoxyglucose (2 1Ciml™'). Soleus strips were continuously 
gassed and incubated at 37 °C for 30 min. At the end of the experiment, individual 
soleus strips were washed three times with ice-cold Hank’s balanced salt solution. 
Muscle strips were weighed and incubated for 1 h at 55 °C in digestion buffer (50 mM 
Tris, pH 8; 100 mM NaCl, 100 mM EDTA, 1% SDS and 500 ugml? proteinase K). 
All of the digested tissue was transferred into scintillant (EcoLite) and the radio- 
activity was measured by liquid scintillation counting. 

MEFs. Mouse embryos were obtained from gravid female mice at embryonic days 
13-14. Embryos were minced, trypsinized and washed with PBS. MEFs were 
plated in DMEM with 10% FBS, and 1% penicillin-streptomycin. 

Primary adipocyte and stromal vascular fraction cultures. Primary adipocytes 
and the stromal vascular fraction were isolated from epididymal WAT and BAT, 
as described previously**”°. Primary adipocytes fractions were plated in DMEM 
containing 5.5mM glucose, 2% fatty-acid-free bovine serum albumin and 1% 
penicillin-streptomycin. For stromal vascular fraction cells, pellets from adipo- 
cyte isolations were washed three times with HDB, and cultured in DMEM with 
10% fetal bovine serum and 1% penicillin-streptomycin. 

RNA studies. Total RNA was isolated by Trizol (Invitrogen) and RNeasy Mini Kit 
(Qiagen). Total RNA (1-2 jig) was used for complementary DNA synthesis with 
Superscript II according to the manufacturer’s instruction (Invitrogen). Relative 
mRNA amount was determined by real-time quantitative on a LightCycler 480 
instrument (Roche). 

Statistics. Data are presented as means + s.e.m. Statistical analysis was per- 
formed using an unpaired t test with GraphPad Prism software. Statistical sig- 
nificance is indicated as *P < 0.05, **P<0.01 and ***P< 0.001. All transient 
luciferase assays were performed on at least three independent occasions. 
Human subjects. Associations with adiposity parameters (weight, BMI, waist 
circumference, hip circumference, waist/hip ratio) were first assessed in partici- 
pants in the Cedars-Sinai/University of California, Los Angeles MACAD study, a 
study of Mexican-American families from Los Angeles*'’. In the present report, 
206 two-generation Mexican-American families were included, comprising 779 
subjects (adult offspring of probands with coronary artery disease and the spouses 
of those offspring) who underwent anthropometric measurements and genotyp- 
ing. By design, the offspring were free of diabetes and clinically manifest cardio- 
vascular disease, thus avoiding secondary changes in phenotype caused by overt 
disease. All studies were approved by Human Subjects Protection Institutional 
Review Boards at University of California, Los Angeles and Cedars-Sinai Medical 
Center. 

Confirmatory studies were undertaken in Mexican-American subjects from 
the MESA. A detailed description of the MESA study design and methods has 
been published previously”. Briefly, 6,814 participants 45-84 years of age who 
identified themselves as white (2,748), black (1,930), Hispanic/Latino (1,496) or 
Chinese (806) were recruited from six US communities between 2000 and 2002. 
To obtain a replication cohort most similar to that of MACAD, we studied MESA 
Hispanics with exclusion of those recruited from the New York site, as the latter 
are mainly from the Caribbean and may thus have genetic differences from the 
Mexican-Americans of MACAD*. This resulted in a cohort of 987 MESA 
Mexican-Americans. 

To determine whether the genetic associations observed in Mexican- 
Americans would also be seen in other ethnic groups, we also examined 2,527 
non-Hispanic white subjects from MESA who had available anthropometric data. 
We then also accessed data from two large consortia of non-Hispanic whites, 
CHARGE (n = 31,373)*° and GIANT (n = 32,504)”, both of which had con- 
ducted genome-wide association studies of BMI. These datasets did not overlap 
in subjects. 

Genotyping of human samples. Genotyping of SNP rs8033595 (S72N) in MACAD 
was performed using TaqMan MGB technology as previously described’***. The 
genotyping success rate was 98.3%. In the MESA Mexican-American and white 
cohorts, S72N was represented by a proxy SNP 13862434 (A/G, r°=1 with $72N 
in the Mexican-American cohort of the phase III HapMap data, and r°=1 in the 
Caucasian European cohort of the phase II HapMap) that was directly genotyped 
(not imputed) in the genome-wide association study conducted in MESA. In 
CHARGE and GIANT, rs8033595 was either directly genotyped or imputed, 
depending on the genome-wide association study arrays used in the individual 
cohorts comprising each consortium. 

Human genetic association analysis. The MACAD cohort is composed of small 
families and marrying-in spouses. Therefore the generalized estimating equation 
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(GEE (ref. 48)) approach was selected to use data from all phenotyped subjects. 
We evaluated association using this robust variance estimation approach to test 
hypothesized associations between phenotypes and genotypes while accounting 
for familial correlations present in the data. The PROC GENMOD procedure in 
SAS (version 9.0, SAS Institute) was used for the association analysis in which a 
sandwich estimator was used assuming exchangeable correlation. Family was 
taken as the cluster factor, ie. members from the same family were assumed to 
be correlated. The adiposity traits were log transformed to approximate con- 
ditional normality and homogeneity of variance better. An additive genetic model 
was assumed in all the association analyses. Analyses used age and sex as covariates, 
unless otherwise specified. The same analytical techniques were used to assess 
association of rs3862434 with adiposity traits within MESA Mexican-Americans 
and non-Hispanic whites. We also conducted an analysis combining the MACAD 
and MESA Mexican-Americans, in which 72N and the minor allele of rs3862434 
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were considered the same. In CHARGE and GIANT, we conducted a weighted 
z-score-based meta-analysis of association of rs8033595 with BMI, combining P 
values obtained from a similar z-score-based meta-analysis conducted in each 
consortium. The meta-analysis was performed using the program METAL 
(http://www.sph.umich.edu/csg/abecasis/metal/). 
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Origin of Saturn’s rings and inner moons by mass 
removal from a lost Titan-sized satellite 


Robin M. Canup! 


The origin of Saturn’s rings has not been adequately explained. The 
current rings are more than 90 to 95 per cent water ice’, which 
implies that initially they were almost pure ice because they are 
continually polluted by rocky meteoroids’. In contrast, a half-rock, 
half-ice mixture (similar to the composition of many of the satellites 
in the outer Solar System) would generally be expected. Previous 
ring origin theories invoke the collisional disruption of a small 
moon’, or the tidal disruption of a comet during a close passage 
by Saturn’. These models are improbable and/or struggle to account 
for basic properties of the rings, including their icy composition. 
Saturn has only one large satellite, Titan, whereas Jupiter has four 
large satellites; additional large satellites probably existed originally 
but were lost as they spiralled into Saturn®. Here I report numerical 
simulations of the tidal removal of mass from a differentiated, 
Titan-sized satellite as it migrates inward towards Saturn. Planetary 
tidal forces preferentially strip material from the satellite’s outer icy 
layers, while its rocky core remains intact and is lost to collision with 
the planet. The result is a pure ice ring much more massive than 
Saturn’s current rings. As the ring evolves, its mass decreases and 
icy moons are spawned from its outer edge’ with estimated masses 
consistent with Saturn’s ice-rich moons interior to and including 
Tethys. 

The Jovian and Saturnian regular satellites are believed to have 
formed within circumplanetary disks of gas and solids produced during 
the end stages of nebular gas inflow to the planets®**°. Recent work 
predicts that multiple generations of Titan-sized satellites formed and 
were lost to collision with Saturn®. Each satellite grows no larger than a 
critical mass, at which point its orbit spirals into the planet owing to 
density wave interactions with the gas disk° (see Supplementary 
Information). This critical mass is comparable to that of Titan (mass 
Mr = 1.35 X 107° g and solid-body radius Ry = 2,575 km)°. The overall 
process produces a final satellite system with either several large satellites 
(like Jupiter’s Galilean satellites) or a single large satellite (like Saturn’s 
Titan)*’. A Saturn-like system (Table 1 and Fig. 1) can be produced 
when large, Titan-sized interior satellites spiral into the planet and are 
lost as gas inflow ends°®. 

A large satellite on an approximately circular orbit becomes unstable 
interior to a distance known as the classical Roche limit, ag = 2.456Rs(ps/ 
p)"" 3, where Rs = 58,232 km and ps = 0.687 g cm ° are Saturn’s current 
mean radius and density, and p is the satellite’s mean density. For a Titan- 
like mean density (py; = 1.88 gcm_°), ag = 1.76Rs. What happens 
once a large satellite drifts within ag depends on its interior structure. 
An undifferentiated, uniform composition satellite disrupts completely. 
However, by the time a large ice-rock satellite approached the Roche 
limit, it would probably have undergone substantial ice melting and 
thus have a differentiated interior. 

The energy associated with forming a Titan-sized satellite heats its 
interior to near the melting point for ice and may cause partial melting”. 
In addition, for even slightly non-circular orbits, the time-varying dis- 
tortion of the satellite’s shape by the planet heats the satellite’s interior at 
a rate! dE/dt ~ (21/2)MpQ°r°(R/r)(ka/Q)e’, where Mp is the planet’s 
mass, 2 and rare the satellite’s orbital frequency and orbital radius, R is 


the satellite’s radius, (k/Q) is the ratio of the satellite’s Love number to 
its tidal dissipation factor (an uncertain quantity but plausibly within 
the range!! 10° * < (k,/Q) < 10” '; Supplementary Information), and 
e is the satellite’s eccentricity. For example, with e = 0.002 (Fig. 1 
legend; Supplementary Information), r= 1.7Rs, (k2/Q) = 10 * and 
R = Ry, tidal heating dissipates about 5 X 10’ ergg ‘ in a Titan-mass 
satellite (M = My) over an estimated orbital decay timescale of about 
10* years (Supplementary Information), comparable to the latent heat 
of fusion of water ice (3 X 10’ ergg '). As ice melts, higher-density 
rock initially contained within the ice rapidly descends to the satellite’s 
centre, so that melting in a satellite’s outer layers creates an outer pure- 
ice mantle overlying a more rock-rich core’*’*. For M = Mr, the sepa- 
ration of rock from ice becomes energetically self-sustaining’? once 
about 50% of the rock has migrated to the satellite’s centre’*"*, and in 
this case the satellite differentiates into an ice mantle and a core that is 
pure rock or rock and metal. 

When a differentiated satellite drifts within its Roche limit, tides first 
strip material from its outer, lower-density layers. The removal of low- 
density material causes the satellite’s mean density to increase until the 
remnant satellite is marginally stable. As the satellite spirals inward, this 
process regulates p to approximately the local critical value for stability 
at the satellite’s semi-major axis 4, Poit = ps(2.456)*(Rs/a)*, until the 
remnant satellite either collides with the planet or fully disrupts. 

Asa simple example, consider a two-layer satellite with an ice mantle 
overlying a core of uniform density Pcore. The satellite’s initial mean 
density p, determines the distance at which the tidal loss of ice begins 
[Qmax =4r(p =P,)], while Pore sets the distance at which the core 
disrupts [Grock = 4R(P = Pore) ]. The satellite sheds only ice across the 
region: 


Table 1 | Saturn’s rings and inner moons 


Object a/Req Mass (102? g) Mean radius (km) Density (gcm7°) 

B ring 1.5-1.9 2-10 = = 

A ring 2.0-2.3 0.6 - = 

Epimetheus 2.51 0.06 58.3 0.63 
Janus 2:51 0.20 90.4 0.61 
Mimas 3.09 3.75 198 1,15 
Enceladus 3.95 10.8 252 1.61 
Tethys 4.89 61.7 533 0.97 
Dione 6.26 105 562 1.48 
Rhea 8.74 230 764 1.23 


Shown are key properties***’ of Saturn’s most massive rings and inner moons with mean radii >50 km. 
Here orbital radii a are scaled by Saturn’s equatorial radius according to convention 
(Req = 60,268 km = 1.035Rs). The total mass in the main rings, contained primarily in the B ring, is 
estimated to be** a few times 10°” g to io” g. Saturn’s inner satellites interior to and including Tethys 
are, as a group, unusually ice-rich, with a mass-weighted average density p..~ 1.07 gcm”°. Disruptive 
collisions* and/or endogenic activity in the case of Enceladus*’ could have removed ice relative to rock 
owing to ice’s higher volatility, so that the current inner satellite compositions probably provide lower 
limits on their initial ice fraction. Moons orbiting exterior to Saturn’s synchronous radius evolve outward 
owing to tidal interaction with the planet. This implies that the moons interior to and including Tethys 
could have all been interior to?® about 4Rs when they formed, consistent with their having been 
spawned from the outer edge of the rings. The model here proposes that Tethys, Enceladus and Mimas 
(or their progenitors) were spawned from a primordial massive ring as it spread diffusively and 
delivered material to the region outside the Roche limit. A similar process appears to be ongoing today, 
with the smaller inner moons (interior to and including Janus) probably forming in the last 10° to 107 
years as a result of recent spreading of the A ring’. More distant Dione and Rhea have been exterior to 
about 6Rs throughout their history®, and thus probably formed independently of the rings. 


1Planetary Science Directorate, Southwest Research Institute, 1050 Walnut Street, Suite 300, Boulder, Colorado 80302, USA. 
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Figure 1 | Results of a satellite accretion simulation’ that produced a 
Saturn-like system of satellites. The satellite disk is supplied by an inflow of gas 
and solid particles from solar orbit, the rate of which decays exponentially with 
a time constant T;,, that is comparable to the solar nebula lifetime. Satellites 
undergo inward type I migration on a timescale proportional to (Mag) '; 
where M is satellite mass, and Gy is the disk gas surface density, with the latter 
proportional to the inflow rate°*. Black circles show the simulated satellites 
(with horizontal lines proportional to orbital eccentricities); Saturn’s satellites 
are shown as green stars. a, Multiple satellites form as solid material flows into 
the disk (t = 0.2 T;,). Once satellites grow to a mass M of a few times 10° 
planetary masses, they begin to migrate inward. b, At t = 1.4 t;,, the system 
resembles Jupiter's Galilean system, with four similarly sized large satellites. 
c, The inner three large satellites are lost to collision with Saturn, with the last 
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For a rock or a rock—metal core, Pore would be about 3.0 to 3.5 g cm? 


(the latter is the density of Jupiter’s rocky satellite, Io), while p,~pr is 
expected. 

Planet contraction models'*"> predict that 2 to 5 Myr after the peak 
rate of its gas accretion, the young Saturn’s radius Rp would have had a 
value between about 1.6Rs and 1.5Rs. Mean observed nebular lifetimes’® 
are about 3 Myr, so that the end of gas inflow to Saturn and the loss of 
the final large satellites interior to Titan would probably occur in this 
timeframe. For dyoc < Rp < Gmax: a satellite will collide with the planet 
before its rocky core disrupts, so that it tidally sheds only ice. Although 
each lost satellite may have produced tidal debris (depending on the 
state of the planet at that time), such material would generally be 
removed through collision with the planet as it was shepherded by 
subsequent satellites migrating inward’’ or perhaps driven to high 
eccentricities by not-too-distant large satellites'®. However, tidal debris 
from the last large satellite to be lost from the Saturnian system could 
survive. 

I use smooth particle hydrodynamics (SPH) to simulate tidal strip- 
ping from Titan-sized satellites (Fig. 2; Supplementary Information). 
The simulations predict initial tidal fragment orbital eccentricities of 
€p ~ 0.1. Fragments would have radii of about 1 to 50 km, depending 
on the tensile strength of the satellite’s outer ice shell’ (Supplementary 
Information). Subsequent collisions between fragments occur with a 
characteristic velocity v ~ e,rQ, where Q= [GM,/r°]"” is the orbital 
frequency, Ms = 5.69 X 10°” g is Saturn’s mass, G is the gravitational 
constant, and rQ is the orbital velocity at radius r, which is about 
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large satellite lost at time t ~ 3 t;,, when the inflow rate has slowed substantially 
and the disk gas density has decreased to og ~ 10 gcm ~. The last lost satellite 
acquires most of its mass at distances of 25 to 30 planetary radii, where disk 
temperatures were low enough for water ice, implying that the satellite would 
have a Titan-like composition with approximately 50% rock and 50% ice. 

d, The final system at t = 10 1;,, has a single large Titan-like satellite at a~ 15 
planetary radii. The satellite eccentricities in these simulations reflect a balance 
between mutual gravitational interactions and eccentricity damping by density 
waves, yielding an average final eccentricity for the large satellites of 

<e> ~ 0.02 (ref. 6). Inclusion of eccentricity damping by tides raised on the 
satellites by the planet is estimated to reduce this to <e> of a few times 10 * 
(Supplementary Information). 


20kms ! at r= 1.7Rs. The collision energy per unit mass, v/ 
2~10'°ergg ', exceeds that needed to catastrophically disrupt’? ice 
objects with radii of 1 km to 50km(~10° ergg ‘toa few 10° ergg ‘), 
so that collisions shatter the fragments into small particles. 

Mutual collisions also rapidly drive the particles into a ring with 
nearly circular and co-planar orbits (Supplementary Information). To 
estimate the total mass of ice produced, I compute the equivalent 
circular orbit, deg = ap(1 — es), for each SPH particle having the same 
angular momentum as its initial orbit with semi-major axis a, and 
eccentricity e,. I then compare a, to an estimate (Supplementary 
Information) of the ice stability distance a;.., obtained by balancing 
heating of the disk by the planet’s luminosity, occurring with a rate per 
unit area of the disk Ep~(18/7)os-p Tp(Rp VE ry(c/ rQ), with radiative 
cooling from the disk surfaces, occurring with a rate per unit area of the 
disk Eraq = 265-8 Ti. Here os_-p is the Stefan—Boltzmann constant, Tp 
and Rp are the planet’s temperature and radius determined by planet 
contraction models!*!%, coc Ti * isthe speed of sound in the gas, and Ty 
is the disk temperature at radius r. Setting Ep = E,,q and solving for the 
distance at which Tg=200K gives djce/Rs ~ 1.6(Tp/400 K)*?(Rp/ 
1.5Rs)*”. As a Titan-like satellite orbit spirals towards the planet, up 
to a few times 10°’ g of thermally stable ice particles is produced 
(Table 2). 

The earliest dynamical evolution of the tidally stripped debris is 
dominated by gravitational interactions with the remnant satellite 
(Supplementary Information), but after the satellite collides with 
Saturn, the debris settles into a pure ice ring containing up to a few 
times 10° g, with a surface density O,ing ~ 10° g cm ’,anda ring outer 
edge at the Roche limit for ice: dp j-¢ ~ 2.2Rs. The gas disk at this late 
time is estimated to have a much lower surface density than the ring, 
with og = 10gcm ° (Fig. 1; Supplementary Information). Describing 
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Figure 2 | SPH simulation showing the tidal removal of ice from a 
differentiated, Titan-mass satellite. SPH represents matter as particles, which 
are here evolved in response to gravity, shock dissipation, and pressure using 
the equation of state M-ANEOS”” (Supplementary Information). Type I 
migration and tidal interaction with the planet cause the satellite’s orbit to spiral 
inward from its initial Roche limit (a,,,,) to the planet’s surface (Rp) in about 
10* years (Supplementary Information). The satellite’s evolution across this 
region is tracked with a series of SPH simulations. The satellite starts with 
= Amax and is evolved for several orbits with SPH to simulate tidal mass 
removal and the establishment of a stable satellite remnant. The remnant 
satellite is then shifted inward by Aa ~ 10 7a (Supplementary Information) 
and re-simulated, with the process repeated until a is small enough that the 
satellite’s rocky core disrupts, which determines a... Frames here show tidal 
stripping from a satellite with a composition of 55% ice and 45% silicate + metal 


drag by the gas disk on the ring as a shear stress on the disk surfaces 
gives a ring decay timescale*” of tga ~ 14Re(Gring/@g)(GMs/ ¢°), which 
is about 10’ years for these surface densities, Ty~ 200K and a 
Reynolds number Re~ 10° (Supplementary Information). This 
exceeds the expected persistence time of the gas disk (nominally 
~10° years), so the ring survives. 

Interparticle collisions cause the ring to spread and decrease in mass 
as inward-flowing material is lost and outward-diffusing material 
accumulates into moons”. The spreading timescale for a massive ring 


js223: 
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where C(r) is a scaling factor”* of ~10 in the B ring and ~40 at ap ice. 
The initially massive ring envisioned here would, over the age of the 
Solar System, have decreased in mass”! such that t,, ~ 4.5 Gyr, implying 


Table 2 | Results of SPH simulations of tidal stripping 


Satellite composition anax Rp, min dice Mring (@eq = Aice) 
50% ice, 15% hydrated silicate and 17 5 1.8 2x 10*°g 
35% silicate 
50% ice and 50% silicate Ly 5 18 2 x 107° 
55% ice and 45% silicate + metal Ly A 1.5 3x 107g 
45% ice and 55% silicate + metal 1.6 A 15 2x 10*°g 


The first four columns list the compositions of the simulated satellites (which all had M = Mz), the semi- 
major axis at which tidal loss of ice commences (amax), the smallest planet radius consistent with pure 
ice loss (Rp, min = rock), and an estimate (see main text and Supplementary Information) of the ice 
stability radius in the circumplanetary disk (a;,.), based on planet contraction models'*"°. All distances 
are shown in units of Saturn’s mean radius, Rs. For a given satellite composition, the mass of tidally 
stripped ice is maximized if the planet's surface at the time of the satellite’s orbital decay is located at 
Rp = Rp, min, SO that the satellite is consumed by the planet just before the loss of rock from its core. A 
more extended planet with Rp > Rp, min causes an earlier collision and a reduced mass of stripped ice, 
while a smaller planet radius and a later collision would lead to rocky fragments inconsistent with the 
composition of Saturn’s rings. The final column shows the mass of ice particles having an equivalent 
circular orbit aeq, exterior to dice assuMINg Rp = Rp, min. Similar total ring masses can be estimated 
analytically (Supplementary Information). The aq distance is representative of the orbital radius to 
which the mass of an initial fragment will settle after undergoing dissipative collisions but before 
substantial diffusive spreading of the ring. By comparing deg to dice, | assume ring particles are in 
thermal equilibrium with the circumplanetary gas disk. If they instead radiate into the cooler 
background nebula, ice particles could be thermally stable interior to aice (Supplementary Information). 


(see Table 2) orbiting a Saturn-mass planet at a = 0.974max after 8 simulated 
hours (a) and 25h (b). Distances are shown in units of 10° km; for comparison, 
Saturn’s B and A rings lie between ~92,000 and 137,000 km from the centre of 
Saturn. Dashed circles indicate the satellite’s orbit and Saturn’s current mean 
radius, Rs; Saturn’s radius at the time of the satellite’s decay was probably 

Rp ~ 1.5Rsg (refs 14 and 15; Supplementary Information). Material originating 
from the satellite’s ice mantle is lost through its inner and outer Lagrange points 
(L1 and L2), leading to particles on highly eccentric orbits (with e~ 10) with 
semi-major axes interior and exterior to that of the satellite, respectively 
(Supplementary Information). Subsequent collisions between particles will 
tend to circularize their orbits, and the clumps seen in b are transient features. 
Interior particles will probably collide directly with the planet or be driven into 
the planet by the satellite, while exterior particles can supply the ring. 


that ¢,ing would now be a few times 10° g cm *, consistent with current 
estimates for the B ring™. 

Ring material spreading beyond the Roche limit accretes to form 
icy moons’. Each moon spawned from the ring’s outer edge grows 
until it reaches a mass such that the timescale for its recoil from the 
ring due to resonant interactions, Trecoi, is comparable to the timescale 
for the ring’s outward diffusion’. With'” Tyecon Meg [(am —1)/ dial / 
[1.6822 OringQmMm); setting Treco ~ Ty gives a characteristic moon 
mass: 


C r 2 am —r\? 
eer rin, - 3 
gg (+5) (=) ( Am ) ( ) 


that depends on 6,in, and the position a, at which the moon accretes 
relative to the ring edge r. Accretion models* find a,,/dp ~ 1.1 to 1.2. 
With 1.1 < (dy)/ap ice) < 1.2, 7 = dp ice: a uniform surface density ring, 
and C = 40, the estimated moon mass in grams is 3 X 10°(Gying/ 10° g 
cm” ?) <Mm* <2 X 10? essa 10°gcm ’) (in approximate agree- 
ment with Supplementary Fig. 4 of ref. 7). This is comparable to 
Tethys’ mass (6 X 107° g) for the initial o,;,,. values predicted here. 

Upon reaching a mass of about m,,*, resonant torques drive the 
moon away from the ring until its most distant strong resonances 
migrate out of the ring”’’. For the 2:1 inner Lindblad resonance, this 
occurs when a, = 1.67, or when a, ~ 3.6Rsg for 7 = ap jce- At this point 
the moon is exterior to Saturn’s early synchronous radius (Supplemen- 
tary Information), so that it continues to evolve outward because of 
tidal interaction with the planet. As each moon recoils away from the 
ring, a new moon is spawned from the ring’s outer edge, with the moon 
masses decreasing with time as G;ing decreases’. As moons evolve 
outward, any mixing with material originating from outside the rings 
would increase their rock content somewhat relative to that of the 
rings. The densities of the moons interior to and including Tethys 
imply that as a group they contain about 90% ice and 10% rock 
(Table 1; Supplementary Information). 

A primordial ring must avoid contamination by impacts of silicate- 
rich micrometeoroids throughout its 4.5-billion-year lifetime in order 
to produce the >90 to 95% water-ice ring observed today. Previous 
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work’ suggests that Saturn’s current rings would be polluted in only a 
few times 10° years. However, the rings could be primordial and still 
unpolluted if the impact rate was overestimated and/or the rings’ mass 
was underestimated). During its extended mission, the Cassini space- 
craft will indirectly sample the impact rate, and will directly measure 
the rings’ current total mass’. While prior ring origin theories (Sup- 
plementary Information) have envisioned an initial ring comparable in 
mass to the current rings, the model here implies an initial ring that is 
several orders of magnitude more massive. A massive early ring would 
be less vulnerable to pollution by rock-rich impacts, and also has the 
advantage of providing sufficient mass and angular momentum (Sup- 
plementary Information) to account ultimately for both the current 
rings and the inner ice-rich Saturnian satellites. 
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Pleats in crystals on curved surfaces 


William T. M. Irvine'}, Vincenzo Vitelli? & Paul M. Chaikin! 


Hexagons can easily tile a flat surface, but not a curved one. 
Introducing heptagons and pentagons (defects with topological 
charge) makes it easier to tile curved surfaces; for example, soccer 
balls based on the geodesic domes’ of Buckminster Fuller have 
exactly 12 pentagons (positive charges). Interacting particles that 
invariably form hexagonal crystals on a plane exhibit fascinating 
scarred defect patterns on a sphere” *. Here we show that, for more 
general curved surfaces, curvature may be relaxed by pleats: 
uncharged lines of dislocations (topological dipoles) that vanish 
on the surface and play the same role as fabric pleats. We experi- 
mentally investigate crystal order on surfaces with spatially varying 
positive and negative curvature. On cylindrical capillary bridges, 
stretched to produce negative curvature, we observe a sequence of 
transitions—consistent with our energetic calculations—from no 
defects to isolated dislocations, which subsequently proliferate and 
organize into pleats; finally, scars and isolated heptagons (previ- 
ously unseen) appear. This fine control of crystal order with curv- 
ature will enable explorations of general theories of defects in 
curved spaces*"'. From a practical viewpoint, it may be possible 
to engineer structures with curvature (such as waisted nanotubes 
and vaulted architecture) and to develop novel methods for soft 
lithography” and directed self-assembly”’. 

Topological defects have played a crucial role in understanding the 
order, rigidity and melting of crystals and other phases of matter in 
two-dimensional flat space’*’*. On a curved surface (Fig. 1), these 
particle-like excitations acquire a new life: they interact not only with 
each other, but with the curvature of the substrate. In a hexagonal 
lattice in which every particle has six nearest neighbours (Fig. 1, inset), 
there are two types of topological defects (Fig. 2): disclinations that 
disrupt orientational order and appear as points of local five-fold or 
seven-fold symmetry, (pentagons or heptagons, having topological 
charge +(27/6), and dislocations, which disrupt translational order 
and appear as disclination dipoles (+/— pairs). That disclinations 
couple to curvature can be understood intuitively by taking a piece 
of paper, and adding, or removing, a 1/3 wedge to ‘make’ a disclina- 
tion, Fig. 2c, d. 

A host of new discoveries**”"* have resulted from studies of these 
defects on the simplest curved surface: the sphere. With increasing size, 
the familiar 12-pentagon soccer ball pattern gives way to ‘scars’, pen- 
tagons dressed by strings of dislocations” *. In this Letter, we introduce 
a different configuration of dislocations, namely, “‘pleats’—topologic- 
ally uncharged grain boundaries with variable spacing that vanish on 
the surface. We experimentally investigate their interaction with curv- 
ature and show when pleats are energetically favoured over undefected 
crystals or topologically charged disclinations. Apart from experi- 
ments on spheres, and bubble bearing paraboloids’’, the interaction 
of defects with variable curvature, negative curvature and surfaces of 
different topologies has remained largely unexplored experimentally 
and is of growing theoretical interest* "°°, 

The topology of surfaces” places a constraint on the total defect 
charge—for example the net charge on a sphere must be 47, as exem- 
plified by a soccer ball that has 12 (=47/(n/3)) pentagons dispersed 
among hexagons. A hemisphere, or disk, requires half the topological 


Figure 1 | Colloidal crystals on curved oil-glycerol interfaces. 

a-d, Fluorescent PMMA particles bound, by image attraction, to oil-glycerol 
interfaces in the shape of spheres (a), domes (b), waists (c) and barrels (d) (see 
Supplementary Information section 2 for details of shape). The particles interact 
via a repulsive screened Coulomb interaction and, ona flat surface, arrange into a 
hexagonal crystal lattice (a, inset). The oil phase is a mixture of cyclohexyl 
bromide and dodecane that matches the refractive index of glycerol, allowing us 
to image particles on highly curved interfaces by confocal microscopy. Because 
of their topology, crystals on spheres and domes require a net defect charge of 
12 X (21/6) and 6 X (27/6), respectively, whereas waists and barrels require 
none. While spheres (a) have no boundary, the remaining surfaces (b-d) do, 
allowing topological defects a choice between boundary and bulk. 
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charge ofa sphere, whereas a cylinder can be defect-free with a require- 
ment of 0 total charge. 

Topology constrains the total charge, but it is energetics that deter- 
mines the number and the arrangement of each charge. In elasticity 
theory’’, disclinations appear as discrete interacting charges and the 
Gaussian curvature as a charge density, both acting as sources of stress: 


FV'L=G(0)— So gndlx—m,) () 


where G= z zis the Gaussian curvature (with R, and R, the principal 


radii of curvature), q,, is the charge of the disclination at x,, 6 represents 
a Dirac delta function, Y is Young’s modulus and x is the Airy stress 
function—and we will refer to the partial or complete cancellation of 
the effects of curvature and stress by topological charges (r.h.s. of 
equation (1)) as screening. For curvature concentrated at a point, as 
in our model paper disclinations of Fig. 2c, the screening can be per- 


fect, the lattice is stress free and the energy E= +> f[ (V x) dA=0 
(where we have assumed no bending energy). For smooth surfaces, the 
screening is more subtle. Geometry provides some insight: consider a 
geodesic triangle drawn on a curved surface, for example, a sphere. 
Curvature causes lines to diverge or converge, affecting the angles at 
the vertices: the sum of the external angles will differ from 2x by 
A@ = |GdA. The same applies to any closed loop formed by connecting 
lattice sites and serves as a measure of angular strain. If instead, a 
disclination is encircled by the loop, by definition this adds/removes 


; 
Ey 
Ey 
bs 


Figure 2 | Disclinations and pleats in a hexagonal lattice. a, Disclinations ina 
hexagonal lattice are topological defects that result from an extra (right panels) 
or missing (left panels) 60° crystalline wedge that matches their topological 
charge of +/—(21/6) marked here by cream/brick circles, respectively. At the 
‘core’ of the disclination is a corresponding five-fold (or seven-fold) coordinated 
particle. In flat space, disclinations produce a large amount of stress in a crystal, 
disrupting orientational order. b, This stress can be relieved by buckling the 
crystal to create a curved surface. Five-fold disclinations are sources of positive 
Gaussian curvature, seven-fold disclinations of negative curvature. c, This 
coupling can be intuitively understood by making disclinations out of paper (see 
Supplementary Information for instructions and cut-outs). Paper can be bent 
much more easily than it can be stretched/compressed, so it bends, resulting ina 
surface free of stress, with all the Gaussian curvature concentrated at the 
locations of the disclinations. Neighbouring crystal planes diverge on these 
surfaces, matching the geodesics of the curved surface. d, Dislocations, 
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a contribution of + (27/6) regardless of the size of the loop. If sufficient 
curvature is enclosed, the angular stress generated by the disclination is 
screened on the outside. Within the patch the screening is incomplete 
and leads to an energetic cost. For later use we define Q= =; | G dA, 


the integrated curvature in units of disclinations. 

The crystals we create consist of poly(methyl methacrylate) 
(PMMA) particles (~2 [um diameter) that are bound to an oil-glycerol 
interface and repel each other. By index-matching the oil to the 
glycerol, we can image the full surfaces (Fig. 1, Methods). The first 
surfaces we investigate are domes (truncated spheres), created by 
droplets sitting on a coverslip, with a circular contact line. The contact 
angle, controlled by treating the glass surface, determines the solid 
angle of the spherical droplet. 

The domes (Fig. 3) exhibit both disclinations and scars as previously 
seen on spherical surfaces. However, for domes, the net disclination 
charge on the surface can vary as the dome inflates from a disk to a full 
sphere. In Fig. 3b we show the topological charge that is on the surface of 
the dome and detached from the boundary as a function of 2. We find 
the intuitive result that the detached charge varies approximately line- 
arly with Q. That is, for a full sphere there are 12 pentagons (+11/3)s, for 
a hemisphere there are 6 pentagons (+7/3)s and on smaller fractions 
the two remain approximately proportional. Note that the topological 
requirement of a total of 6 (+7/3)s is satisfied at all times by com- 
pensating charges on the boundary. 

Negative curvature surfaces lack the simplicity and familiarity that 
we associate with positive curvature surfaces such as the sphere. For 


uncharged pairs of seven- and five-fold disclinations, can also be made by 
folding and gluing hexagonal paper. A set of three closely spaced, aligned 
dislocations (7-5,7-5,7-5) are shown on an approximately relaxed sheet. This is a 
grain boundary which vanishes at the centre of the sheet—a ‘pleat’. Note that 
negative curvature emanates from the vanishing point of the pleat, as evidenced 
both by the buckling of the sheet and by the 30° divergence of parallel lines 
impinging from the top. e, A stress-free pleat can be achieved by allowing steps 
out of the surface. The pleat retains the property that width is added along the 
pleat length in proportion to the linear density of dislocations. f, The top of the 
Chrysler building in New York consists of four vertical pleats on a cylinder. Here 
the pleats are formed from dislocations in a square field. Counting from the top, 
steps 3, 4, 5, 6 are approximately equally spaced at 8.4 m and forma cone with no 
Gaussian curvature. A gradient in linear dislocation density is achieved by 
spacing the second and first step at 9.4 m and 9.8 m spacing, resulting in a spike 
with negative Gaussian curvature crowning the cone. 
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Figure 3 | Topological charge on domes and waists. a, Isolated seven-fold 
disclinations, observed here for the first time, can be seen at the neck of the 
capillary bridge (top). On the dome, sphere and barrel (bottom), the defects 
group into grain boundaries each having a single five-fold disclination in excess. 
The angular length of these scars is uniform, the number of dislocations 
depends only on R/a. The defect configurations on both surfaces are similar to 
the arrangement of defects on the surface of a sphere with no boundary. Both 
surfaces have uniform positive curvature. The black dots represent particles 
stuck to the glass surface, defects with charge +7/3 are coloured brick, defects 
with charge —1/3 are coloured cream. b, Detached topological charge on 


Figure 4 | Pleating and disclination unbinding on a stretched capillary 
bridge. a—d, Confocal images of a capillary bridge that was stretched from 
quasi-cylindrical to highly curved. h-k, The corresponding reconstructed and 
triangulated surfaces. a, h, The compressed bridge exhibits very few defects. In 
this regime, the bridge is weakly curved and therefore a large patch of integrated 
curvature is required to screen the charge of a disclination. The circle that 
encloses such a patch is shown in e; clearly its radius exceeds the height of the 
bridge. As the bridge is stretched (i-j), we observe the appearance of 
dislocations, neutral disclination dipoles, polarized with the seven-fold defects 
towards the maximally negatively curved neck. The crystal planes can be seen to 
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domes and waists versus the integrated Gaussian curvature, 2. Detached 
charge is the sum of the charge of isolated disclinations and of scars that are not 
connected to the boundary. On domes (blue symbols), we find the intuitive 
result that there is approximately one (1/3) topological charge per unit’s 1/3 
worth of integrated curvature. On negatively curved capillary bridges (red 
symbols), we find no net disclinations on the surface for an integrated curvature 
down to —10. For total curvature beyond the threshold of — 10, disclinations 
rapidly fill the surface until 12 (—1/3) disclinations match the Q = —12 
curvature. The dashed lines are guides to the eye. 


diverge around these dislocation pairs, healing the stress induced by the 
negative curvature. As the bridge is stretched further, these dislocations 
proliferate, forming ‘pleats’, neutral grain boundaries that vanish in proximity 
of the neck. Uniform pleats on a skirt (f) reduce the circumference with height, 
producing a conical shape. Decreasing the pleat density with height would lead 
to negative curvature flaring the skirt. The divergence of the crystal planes at the 
opening ofa high-density pleat (grey lines) is roughly half that observed around 
an isolated disclination (k). Finally, as the size of a patch required to screen the 
topological charge of an isolated seven-fold defect fits on the surface (g), we 
observe the appearance of isolated disclinations. 
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Figure 5 | Disclination and polarization charge on the surfaces of waists, 
domes and barrels. On each surface in our ensemble, we considered a patch 
(‘pillbox’) of variable size, for example as represented by the area between the 
red lines in the inset (bottom right) and computed the integrated Gaussian 
curvature (horizontal axis) versus the accumulated disclination plus 
polarization charge in the patch (vertical axis). All topological charges in the 


example, it is not possible to embed a complete pseudo-sphere (sphere 
of negative curvature) in three-dimensional space. We create negative 
curvature surfaces by forming capillary bridges between two surfaces 
treated to have partially wetting glycerol contacts. These surfaces, 
shown in Figs 1, 3 and 4, are constrained to have constant mean 
curvature, z + ~ and enclose a fixed volume. They are unduloids, 
nodoids and for some specific constructions catenoids or sections of 
spheres**”* (See Supplementary Information section 2). Topologically 
equivalent to cylinders and therefore requiring zero net charge, these 
surfaces typically have varying Gaussian curvature, as shown by the 
colour shading in Fig. 3b. On highly curved surfaces, we observe iso- 
lated — 1/3 disclinations (heptagons) for the first time (Fig. 3a). These 
are the negative curvature counterpart of the pentagons on the sphere 
and can also appear as ‘scars’ with an excess heptagon. However, if we 
plot (Fig. 3) the charge detached from the boundary versus Q for an 
ensemble of capillary bridges, we find no net topological charge on the 
surface for an integrated curvature down to —10. For total curvature 
beyond this threshold, disclinations rapidly fill the surface until 12 
heptagons, each (—77/3), match the 2 = —12 curvature. If no isolated 
charges are present below threshold, how does the crystal screen the 
curvature? A glance at surfaces below threshold (Fig. 4i, j) immediately 
suggests an answer: charge neutral dislocations. 

To investigate this regime, we pulled apart the coverslips that con- 
fine the capillary bridges. As the boundary of the bridges is pinned and 
their volume conserved, their shape changes from almost cylindrical to 
highly curved, allowing us to follow the introduction of defects below 
threshold (see Fig. 4). The first defects that we observe as the (negative) 
curvature is increased are dislocations, dominantly polarized with their 
(—1/3) disclinations pointing towards the region of highest negative 
curvature, the waist of the capillary bridge. As the integrated negative 
curvature is increased, these polarized dislocations proliferate and 
organize into lines. In flat space these lines, known as grain boundaries, 
close on themselves or span the whole sample. Here we observe neutral 
grain boundaries which vanish on the surface. They are analogous to 
fabric ‘pleats’. In clothing, the different circumference of the waist and 
the hips is often accommodated by vertical pleats (Fig. 4f). Along the 
length of the pleat, extra fabric is added to the width. 
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patch (excluding the edges) contribute, whether isolated or connected by a pleat 
to the edge. Once pleats and scars are taken into account, we regain the intuitive 
result of a linear relation between Gaussian curvature and defect charge. Note 
that the triangular symbols correspond to samples with no detached charge on 
the surface. 


At the point where the pleat begins, the opening angle AQ is deter- 
mined by the dislocation line density ng according to”* 


Ad=ng2a - (2) 


where a is the lattice spacing. If A@ perfectly matches the integrated 
Gaussian curvature, the stress induced by the substrate is completely 
screened. The smallest A@ is achieved by a pleat for which ng is so low 
that only a single dislocation is present on a surface of characteristic 
length H. Equation (2) then gives a geometric criterion for the onset of 
pleating: A@ = ||GdA| ~ a/H which can be arbitrarily small. Perfect 
screening cannot be achieved on a smooth surface. However, balancing 
the energy of a stretched but un-defected crystal, YA@*H?, with the 
energy of a pleat, YaHAOlog(AQ), gives an energetic criterion for the 
onset of pleating, ||GdA| ~ (a/H)log(H/a), that agrees with the geomet- 
ric criterion up to logarithmic corrections (Supplementary Information). 

There is a maximum opening angle for pleats AO max = ¥ that corre- 
sponds to the maximum dislocation line density ng = + in equation 
(2), see Fig. 2d. If we place these pleats along the axis of a cylinder the 
result is a cone, as illustrated in Fig. 2f, which shows the conical shape 
of the top of the Chrysler building, in New York. If the dislocation 
density in the pleats (or the density of pleats) varies, then we can have 
Gaussian curvature. On the Chrysler building, such a gradient in spacing 
produces a negative curvature spike that crowns the cone. 

When our capillary bridges are strongly curved, multiple pleats and 
scars act in concert to relieve the strain induced by the curvature. This 
cooperative screening can be understood by treating dislocations as dipoles 
of disclinations. Just as in electrostatics, a divergence in polarization 
producesa polarization charge, pleat gradients can produce an effective 
disclination polarization charge, which along with isolated disclina- 
tions contributes to screen curvature so that 


n 

[oaa~ Sat faa V-P(x) (3) 
c= 

where P(x) is the dislocation density per unit area. In Fig. 5, we verify 

that the integrated Gaussian curvature over a ‘patch’ of variable size on 

each surface in our ensemble (including domes, barrels and waists) is 

approximately equal to the accumulated disclination and polarization 
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charges in the patch (excluding surface edges), for all regimes of 
deformation. 

Pleats allow a finer screening of curvature than scars, as the angular 
deficit produced by disclinations is not tunable but quantized in units 
of 1/3. Upon equating 7/3 to the Gaussian curvature multiplied by the 
area of a patch of radius r, we have a heuristic criterion for scarring 


r=,/ — that is consistent with our observations. If the distance to the 


boundary of our curved substrates is greater than r, isolated disclina- 
tions or scars are observed; if less than 1, pleats arise as the generic 
mode of stress relaxation. 

In summary, pleats (neutral dislocation lines that vanish on the on 
the surface) afford a much finer screening of Gaussian curvature than 
do disclinations of quantized topological charge, because the area 
surrounding the vanishing point of a pleat can have a continuously 
controllable angle deficit between 0 and 1/6. Whether pleats may have 
a role to play in the design of geodetic-dome-like structures with equal 
length struts we leave open for exploration. Beyond the physics of 
crystalline defects, the experimental playground we have developed 
is ideal for exploring general questions concerning the physics of order 
and disorder” in curved space. 


METHODS SUMMARY 

Sample preparation. The oil-glycerol surfaces coated with PMMA particles are 
prepared as follows. First, the glycerol droplets and capillary bridges are prepared 
in contact with air in a capillary channel. Then, the channel is filled with particles 
suspended in the oil. Particles in the bulk can be removed by subsequently flushing 
the channel with clean oil. The PMMA particles, prepared following refs 26, 27, are 
coated with a layer of poly(hydroxy stearic acid) (PHSA), which charges positively 
(~100 charges per particle) in the oil”*. The oil phase is a mixture of cyclohexyl 
bromide (CHB) and dodecane that matches the refractive index of glycerol, allow- 
ing us to image with minimal distortion the full surface even when it is highly 
curved (Fig. 1). The particles are highly hydrophobic, but nonetheless bind to the 
glycerol-oil interface by image charge attraction to the higher dielectric glycerol 
(Eglycerol “42, écHB~7-9)**. The dielectric contrast also drives the migration of ions 
from the oil phase into the glycerol, cleaning the oil above the interface, leading to a 
Debye screening length of ~50 um. Furthermore, positive ions are pumped from 
the oil to the water phase preferentially over negative ions, causing the glycerol to 
charge positively and repel residual particles suspended in the oil. These effects 
combine to create a clean lattice of strongly repulsive particles uninfluenced by 
particles in the bulk. 

Imaging and reconstruction. Particles were imaged using a Yokogawa CSU-10 
spinning disk confocal microscope and a Leica SP5 confocal microscope. Confocal 
images were rendered using Andor IQ. Particles were tracked using standard IDL 
tracking routines”. Triangulations and rendering were done using custom codes 
written in Matlab. 
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Magnetic fields at the Earth’s surface represent only a fraction of the 
field inside the core’. The strength and structure of the internal field 
are poorly known”, yet the details are important for our understand- 
ing of the geodynamo. Here I obtain an indirect estimate for the field 
strength from measurements of tidal dissipation. Tidally driven flow 
in the Earth’s liquid core develops internal shear layers, which distort 
the internal magnetic field and generate electric currents. Ohmic 
losses damp the tidal motions and produce detectable signatures in 
the Earth’s nutations. Previously reported evidence of anomalous 
dissipation in nutations** can be explained with a core-averaged field 
of 2.5 mT, eliminating the need for high fluid viscosity® or a stronger 
magnetic field at the inner-core boundary’. Estimates for the internal 
field constrain the power required for the geodynamo”®*. 

Tidal forces with periods of around one cycle per day cause varia- 
tions in the direction of the Earth’s rotation (known as precession and 
nutation). The motion includes separate angular velocities for the solid 
and fluid cores (Q, and Q¢ respectively), which are misaligned with 
the angular velocity of the mantle, Q,,,. Coupling between the motion 
of the core and the mantle gives rise to free modes of oscillation that are 
resonantly excited at nearby tidal periods. One of these modes, the free 
core nutation, is characterized by angular velocities Q, and Q, that tilt 
together from the initial angular velocity of the mantle. The other 
mode is usually called the free inner-core nutation (FICN) because 
the motion mainly involves a tilt of Q,. 

Flow in the fluid core is often approximated by a solution due to 
Poincaré’. Fluid vorticity in the core is initially 22 oz, where Qp is the 
rotation rate of the Earth and z is the polar axis of the mantle. When Q, 
tilts out of alignment with z, the fluid is assumed to have uniform 
vorticity of 2Q corresponding to rigid-body rotation about an 
inclined axis. However, a rigid-body rotation cannot satisfy boundary 
conditions on the fluid velocity because of the spheroidal shape of the 
core. Consequently, the solution for the fluid velocity has the form 


vi (r,t) = [Q¢(t) = Qn (0) x r+v'(r,t) 


where r is the position vector, t is time and v’ describes the deviation 
from rigid-body rotation. Poincare’s solution for v’ enforces boundary 
conditions at the core-mantle boundary without disturbing the con- 
dition of uniform vorticity. Thus, an exact solution is found for the flow 
of an inviscid fluid when the core is entirely liquid (that is, there is no 
inner core). The solution for v’ can be represented by a potential flow, 
which is readily adapted to satisfy boundary conditions when the inner 
core is present. However, the resulting fluid velocity does not obey the 
governing equations when the inner core tilts out of alignment with the 
mantle (Methods). Strain in the flow alters the vorticity (through vor- 
tex stretching), invalidating the assumption of uniform vorticity. 

The breakdown of Poincaré’s solution is due primarily to a tilt of 
the inner core. I investigate the breakdown by determining v’ when the 
angular velocities Q¢ and Q, are prescribed. It suffices to assume that 
the mantle rotates with constant angular velocity, 25z, while Q,(t) and 
Qt) vary periodically with frequency w. The amplitude and phase of 
Q(t) and Q(t) are chosen to conserve angular momentum, so the 
motion corresponds closely to the FICN free mode. 


Differential rotation between the fluid and solid cores causes a small 
radial motion at the inner-core boundary due to hydrostatic flattening. 
The total fluid velocity, v, is governed by the linearized Navier-Stokes 
equation 

Ove . 7 

OE +22 vp = —VP+EV ve 

which is written in non-dimensional form using the radius, R, of the core 
as a length scale and On as a timescale; P is the fluid pressure and the 
Ekman number, E = v/QoR’, characterizes the effect of fluid viscosity, 
v, on the dynamics. For typical parameters'®'’ (vy = 10 °m’s }, 
R=348X10°m, Q)=0.73X10 *s '), the Ekman number is 
roughly 10 '°. Numerical calculations for realistic values of E are not 
feasible, but it is possible to decrease E until a persistent pattern of flow 
emerges. Scaling relations are used to extrapolate the numerical solution 
to the appropriate value of E. 

Figure la shows the kinetic energy in v’ on a meridional cross- 
section through the outer core for E= 2X 10 ’. The kinetic energy 
is concentrated in narrow bands (shear layers) that form conical sur- 
faces in three dimensions. The orientation of the bands is set by the 
propagation and reflection of inertial waves; closed propagation paths 
within the core establish the location of the shear layers'*”’. 

The imposed radial motion at the inner-core boundary (maximum 
at mid latitudes) is not evident in Fig. 1a because the flow associated 
with the internal shear layers is much larger. Indeed, the maximum 
flow occurs at the rotation axis, where the wave motion converges 
along conical surfaces". 

The width, /,, of the shear layer decreases as E is lowered. A quant- 
itative estimate for /, is obtained from the ratio of viscous dissipation to 
kinetic energy in v’ (Supplementary Information). Once E< 10 °, the 
width approaches an asymptotic value of 1, = E'’’, which is consistent 
with theoretical estimates'*!°. A realistic value of E= 10 '° yields 
L= 10 °, corresponding to 35 m in the core. 

Narrow shear layers distort the local magnetic field and generate 
electric currents. Most of the current is produced inside the shear 
layers, but diffusion allows the current to spread beyond them. The 
magnetic skin depth is nominally 200m at diurnal periods, so the 
current layer is thicker than the shear layer. A quantitative description 
of magnetic perturbation, b, is given by the linearized induction equa- 
tion, which is written in non-dimensional form as 

° =Vx(v' xB)+DV’b 
where B is the initial magnetic field and D = n/QpR° characterizes the 
magnetic diffusivity, 7. I set D = 10E”’’ so that the width of the current 
layer (or skin depth, |, = (2D)”) is constant relative to 1, as E is varied, 
yielding the expected value when E = 10 '°. A uniform field, B = Bz, 
is assumed for simplicity because the structure of the internal field is 
unknown. Larger perturbations occur if B is perpendicular to the shear 
layer, but this configuration is unlikely to be typical. The intersection of 
shear layers with a uniform, vertical field approximates the average 
dissipation for a random field orientation, which may be appropriate 
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Figure 1 | Structure of flow and magnetic field perturbation in the fluid core 
when the inner core tilts out of alignment with the mantle. a, Kinetic energy 
in v’ ona meridional cross-section for E= 2 X 10’. The flow is driven by 
radial motion at the inner-core boundary with a prescribed frequency 

@ = Qo(1 — ), where & = 0.0025 is the hydrostatic flattening of the inner core. 
Shear layers of width O(E”?) are oriented in the direction of inertial wave 
propagation. Red, highest energy; deep blue, reference (zero-energy) state. 

b, Magnetic energy in the perturbed field due to the influence of shear layers on 
a uniform, vertical magnetic field. Steep gradients in the perturbed magnetic 
field cause electric currents that damp the tidal motion. 


for the true field. Figure 1b shows the magnetic energy in b for 
E=2%X 10 ’andan initial magnetic field of B= 1 mT. A comparison 
of the ohmic dissipation with the magnetic energy in b confirms that /, 
is approximately equal to the skin depth. 

The amplitude of b, b, depends on both the shear, v’/I,, and the ratio 
of length scales J,/l,. Because the ratio ,/l, is held constant in the 
calculation, the magnetic perturbation is expected to scale as b/B « 
v/E', where v is the amplitude of v’. Order-of-magnitude estimates 
for the resulting magnetic energy and ohmic dissipation are b* and 
Db’/l,”, respectively. Both of these quantities vary with Ekman number 
as E 7? because |, is set by the skin depth. Direct calculations confirm 
the trend in dissipation with E (Fig. 2). Thus, the ohmic dissipation is 
substantially larger for realistic values of E owing to the thin shear 
layers. For example, the dissipation rate for E= 10° ’ in Fig. 2 increases 
to®=1.8X10 °atE=10 * foran internal magnetic field of 1 mT. 
By comparison, the (dimensionless) kinetic energy of the motion is 
4.4 X10 °. This level of dissipation is sufficient to explain an anom- 
alous source of dissipation in the Earth’s nutations. 
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Figure 2 | Ohmic dissipation in the core due to internal shear layers. 
Decreasing the Ekman number E causes higher ohmic dissipation: a series 
of calculations confirm the expected dependence, ® x E 7’, which is 
indicated by the slope of the solid line. The quality factor is calculated from 
equation (1) using 2E, = A,|Q, — 2, ? where A, is the moment of inertia of 
the inner core. The numerical calculations use a constant-amplitude tilt of 
|Q, — Q4| = Qo, corresponding to a unit-amplitude tilt in non-dimensional 
variables. The non-dimensional kinetic energy is FE, = 4.4 x 10°. 


Dissipation is evident in nutation measurements as a phase lag 
between the tidal force and the Earth’s response. Corrections are rou- 
tinely applied for the effects of mantle anelasticity'’ and ocean tides"*, 
but several large phase lags remain unexplained, particularly at tidal 
periods close to the natural periods of the free modes. The measured 
phase lags are often related to the quality factor, Q, of the free modes, in 
analogy to the relationship for a damped harmonic oscillator. A low Q 
for the FICN mode is inferred by fitting nutation measurements to a 
theoretical model’’. An estimate’ based on nutation measurements 
before 2000 yielded Q = 677, whereas a more recent study® using an 
extra nine years of data and a different estimation procedure obtained 
Q= 459 + 27. The quality factor for the predicted flow is given by 


_, tt 
QO Fak ) 
where t = 2n/Re(«) is the period of the motion and E, is the total 
kinetic energy. Most of the kinetic energy in the FICN mode is due to 
relative rotation of the inner core (that is, Q, — Q,,,). The dependences 
of E,, and ® on Q, — Q,, are identical, so the predicted quality factor is 
independent of the amplitude of the inner-core tilt. 

Figure 3 shows the predicted Q-' asa function of the internal field 
strength, B. The numerical calculations represent a ‘weak-field’ 
approximation because the electromagnetic force due to b is not 
included in the dynamics. This approximation is reasonable when the 
field is weak, but it tends to overestimate the dissipation when the field is 
strong”’. Scaling arguments suggest that the electromagnetic force is 
small in comparison with the viscous force at computationally access- 
ible values of E, so a reasonable internal field would have little influence 
on the calculated flow. Instead of including the electromagnetic force in 
the calculation, I correct the numerical results using a local analysis for 
the interaction of flow and magnetic field in the shear layers. 

The shear layers can be represented by a linear superposition of 
inertial waves with a short wavelength in the direction perpendicular 
to the layers. The amplitude of the local wavenumber is nominally 
k=l, ', which characterizes the width of the layers. A plane-wave 
solution for the inertial waves can be coupled to the magnetic induc- 
tion equation to account explicitly for the electromagnetic force in the 
dynamics of the waves (Supplementary Information). The local plane- 
wave solution permits calculations with realistic values for viscosity 
and field strength, avoiding some of the limitations of the numerical 
calculations. Including the electromagnetic force in the dynamics 
decreases the amplitude of the waves, reducing both the electric cur- 
rent and the associated dissipation. These local solutions are used 
to obtain estimates for the ohmic dissipation with and without the 
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Figure 3 | Predicted dissipation as a function of magnetic field strength. 
The numerical calculation represents a weak-field approximation because the 
electromagnetic force is not included in the dynamics. A correction for the 
electromagnetic force is based on a local analysis of inertial waves in the 
presence of a magnetic field. Ohmic dissipation due to magnetic coupling at the 
inner-core boundary requires a radial magnetic field of 7 mT or more to explain 
the observed dissipation’. Internal shear layers in the outer core explain the 
observed dissipation with an average magnetic field of 2.5 mT. 


addition of the electromagnetic force. The ratio of dissipation esti- 
mates is used to correct the numerical results in Fig. 3. 

I also calculate the dissipation due to magnetic coupling at the 
inner-core boundary”. In this case, the strength of the magnetic field 
refers to the radial component at the inner-core boundary. A strong 
radial field is required to explain the observed dissipation with mag- 
netic coupling’. Alternatively, a high fluid viscosity (v = 10 m*s_') has 
been proposed? as the source of dissipation. Allowing for the influence 
of internal shear layers eliminates the need for a high viscosity or a 
strong radial field at the inner-core boundary. The nutation observa- 
tions can be explained with a 2.5-mT field when the flow is corrected 
for the effects of electromagnetic forces. The weak-field limit estab- 
lishes a lower bound of roughly 2 mT. 

Numerical models'*’ and theoretical consideration” suggest an 
internal magnetic field of 1-4mT. The field strength inferred from 
tidal dissipation is compatible with these predictions, although the 
tidal estimate is not sensitive to the azimuthal component of the mag- 
netic field. This lack of sensitivity is due to the conical structure of the 
shear layers, which causes little distortion of the azimuthal field. 
However, the tidal estimate is sensitive to the field that controls the 
propagation of torsional oscillations’. Recent evidence* for a strong 
field from the propagation of torsional oscillations is consistent with 
the estimate presented here. A source of uncertainty in the tidal estim- 
ate arises from the viscosity of the fluid core because it controls the 
thickness of the shear layers. A higher viscosity reduces the shear and 
decreases the ohmic dissipation. However, the change in the ratio J,/1, 
causes an increase in dissipation. The net effect is a modest decrease in 
dissipation. Doubling the viscosity requires a 10% increase in the field 
strength to explain the observed dissipation. In spite of these uncer- 
tainties, there are few observations that constrain the strength of the 
internal field. The tidal estimate represents a core-wide average with 
sensitivity to all but the azimuthal component of the field. It is striking 
that radio emissions from distant quasars” offer insights into the 
internal magnetic field by providing a precise determination of the 
Earth’s nutations. 


METHODS SUMMARY 


The numerical calculations are based on the dynamo model of ref. 24 with the 
nonlinear terms omitted. The velocity and magnetic fields are represented using 
vector spherical harmonics in a series expansion up to degree 260. I apply stress- 
free boundary conditions to the velocity field and insulating boundary conditions to 
the magnetic field. This choice ensures that all of the viscous and ohmic dissipation 
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is due to the internal shear layers. Use of no-slip boundary conditions introduces 
thinner, O(E"”), shear layers at the boundaries and a radial flow due to Ekman 
pumping, but the net effect on the dissipation is small for realistic values of E. I use 
finite differences for the radial derivatives on a uniformly spaced grid with up to 360 
radial levels. Both the velocity and magnetic fields have periodic time dependence 
with a prescribed frequency. The governing equations yield a system of algebraic 
equations for the radial coefficients of the vector spherical harmonic expansion. I 
obtained solutions iteratively using the GMRES method”. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Poincaré’s solution for the flow of an inviscid fluid can be represented by”® 
Vinay XPV (2) 


where Y= Q — Q,y is the relative angular velocity of the fluid and w is a scalar 
potential that describes the deviation from rigid-body rotation. The continuity 
condition, Vev; = 0, requires 


Vw =0 
which is subject to the boundary condition 
veen=0 (3) 


on the normal component of the velocity at the core-mantle boundary. The 
spheroidal boundary is described in Cartesian coordinates x, y and z by 


2 yp 2 
Pai aa ae 

where a and c are the equatorial and polar radii, respectively. The flattening of the 

boundary is defined by 


and the outward normal, n, can be approximated by 


a ae a 
n=~ x4 294 214268 
a eaea 
to first order in é,. 
The relative angular velocity is confined to the equatorial plane for small- 


amplitude motion. Consequently, I express yin the form 
He = 16K + pel" (4) 
where « is the frequency of the motion. The sign convention in equation (4) 


requires w ~ +Q, for diurnal tidal motions. Substituting equation (2) into the 
boundary condition in equation (3) yields 


Vien= —2er7¢(yz—ixz)a ‘el (5) 


for the normal component of the potential flow. The potential that satisfies equa- 
tion (5) is 


P= erie —ixzel™" (6) 


where second-order terms in é are neglected. 

This solution also satisfies boundary conditions at the inner-core boundary 
when the relative angular velocity of the inner core, ¥, = Q, — Q,,,, equals zero 
and the flattening, ¢,, of the inner-core boundary equals ¢. (The difference in 
flattening is about 5% for a hydrostatic Earth’’.) Departure from either of these 
conditions requires a correction to the potential flow. In principle, the potential 
flow can be adapted to satisfy boundary conditions at both the core-mantle 
boundary and the inner-core boundary. However, the resulting flow no longer 
satisfies the governing equations for an inviscid, rotating fluid. 

The difficulty arises from the structure of the potential flow. The form of the 
solution in equation (2) ensures that the fluid has uniform vorticity because the 
potential flow has no vorticity. The solution for y in equation (6) is linear in z, 
which means that the vertical velocity is constant and the vertical strain rate, 
W/dz’, is zero. Thus the potential flow does not alter the uniform vorticity by 
stretching the dominant z component associated with steady rotation. In other 
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words, the form of the potential flow is compatible with the assumption of 
uniform vorticity in the flow. However, when the potential flow is modified to 
accommodate more general boundary conditions at the surface of the inner 
core, the resulting potential flow is no longer linear in z. Vertical strain due to 
the potential flow stretches the z component of vorticity, causing departures 
from uniform vorticity. As a result, the potential flow is now incompatible with 
the assumption of uniform vorticity. This means that a potential flow is insuf- 
ficient to describe the deviations from rigid-body rotation in the more general 
case when a solid inner core is present. 
A more general solution for flow in the core is represented by 


Ve =e Xt+V 


where the deviation, v’, from rigid-body rotation is expanded in vector spherical 
harmonics”. The boundary condition on the normal component of velocity at the 
inner-core boundary requires that 


ven =2[7,65 + Xe(e¢ —&s)|(yz—ixz)ay | (7) 


where a, is the equatorial radius of the inner core and 7, is the amplitude of y,. The 
difference é¢ — ¢, is small in comparison with ¢,, and the amplitude of y¢is small in 
comparison with that of y, when the angular momentum of the fluid and solid core 
is conserved. To focus on the role of the inner core in the numerical calculations, I 
set é¢ = 0 and approximate equation (7) by 


v'st=2[Z, —Zles(yz—ixz)ay | 

where n = f is valid to leading order in ¢,. In addition, I use stress-free boundary 
conditions at both the inner-core boundary and the core-mantle boundary and set 
v'er=0 at the core-mantle boundary. The non-zero radial motion at the inner- 
core boundary requires care in the definition of stress-free boundary conditions. 

Both yrand x, are prescribed in the numerical calculations for v’. The amplitude 
and phase are chosen to conserve angular momentum in the core, and the fre- 
quency of the motion is w = Q(1 — ¢,), corresponding to the frequency of the 
FICN in the absence of elastic deformation and electromagnetic coupling at the 
fluid—solid boundaries. 

The ratio of dissipation to kinetic energy (or Q” ') is independent of the ampli- 
tude of the imposed rotation, so I set 7, = 1 and 7; = —A,/Ap where A, and A¢are 
the moments of inertia of the solid and fluid cores, respectively (assuming constant 
density in the fluid and solid). Similar results for the shear flow and dissipation are 
obtained if a value for 7, is imposed and the total velocity, v; (rather than v’), is 
expanded in vector spherical harmonics. In this case, the numerical solution 
includes the rigid rotation of the fluid core. Conservation of angular momentum 
can be used as a check on the calculations at low values of E. However, the flow 
associated with the average rotation of the fluid is larger by a factor of roughly 1/z, 
than the deviation about a rigid rotation. Consequently, it is preferable to solve 
directly for the deviation, v’, using imposed values for %, and y¢. In either case, a 
rigid rotation is removed from the flow before solving the induction equation for 
the magnetic perturbation because the main magnetic field is specified in a frame 
that rotates with the fluid. The use of electrically insulating boundary conditions in 
the solution of the induction equation ensures that magnetic perturbations are due 
solely to the effect of shear layers in the flow. 
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Greenhouse gas mitigation can reduce sea-ice loss 
and increase polar bear persistence 


Steven C. Amstrup't, Eric T. DeWeaver?, David C. Douglas®, Bruce G. Marcot*, George M. Durner’, Cecilia M. Bitz & David A. Bailey® 


On the basis of projected losses of their essential sea-ice habitats, a 
United States Geological Survey research team concluded in 2007 
that two-thirds of the world’s polar bears (Ursus maritimus) could 
disappear by mid-century if business-as-usual greenhouse gas 
emissions continue’ *. That projection, however, did not consider 
the possible benefits of greenhouse gas mitigation. A key question 
is whether temperature increases lead to proportional losses of sea- 
ice habitat, or whether sea-ice cover crosses a tipping point and 
irreversibly collapses when temperature reaches a critical thresh- 
old**. Such a tipping point would mean future greenhouse gas 
mitigation would confer no conservation benefits to polar bears. 
Here we show, using a general circulation model’, that substan- 
tially more sea-ice habitat would be retained if greenhouse gas rise 
is mitigated. We also show, with Bayesian network model out- 
comes, that increased habitat retention under greenhouse gas 
mitigation means that polar bears could persist throughout the 
century in greater numbers and more areas than in the business- 
as-usual case’. Our general circulation model outcomes did not 
reveal thresholds leading to irreversible loss of ice®; instead, a linear 
relationship between global mean surface air temperature and sea- 
ice habitat substantiated the hypothesis that sea-ice thermodyn- 
amics can overcome albedo feedbacks proposed to cause sea-ice 
tipping points*®*. Our outcomes indicate that rapid summer ice 
losses in models’ and observations®”° represent increased volatility 
of a thinning sea-ice cover, rather than tipping-point behaviour. 
Mitigation-driven Bayesian network outcomes show that previ- 
ously predicted declines in polar bear distribution and numbers’ 
are not unavoidable. Because polar bears are sentinels of the Arctic 
marine ecosystem" and trends in their sea-ice habitats foreshadow 
future global changes, mitigating greenhouse gas emissions to 
improve polar bear status would have conservation benefits 
throughout and beyond the Arctic’. 

Polar bears are dependent on the sea ice for access to their marine 
mammal prey’*”*, and occur only in Northern Hemisphere marine 
areas that are ice covered for long enough periods to allow sufficient 
foraging opportunity. Observed declines in summer sea ice have been 
associated with declining physical stature and condition, poorer sur- 
vival and declining population size*’>’®. The anticipated future loss of 
sea-ice habitats resulting from global warming! was the principal dri- 
ver of polar bear declines projected by the United States Geological 
Survey (USGS) studies*’’”. Improved management of hunting and 
other human activities was found unable to materially alter this out- 
come (see plate 6 in ref. 3). 

The USGS studies relied on general circulation model (GCM)- 
projected losses of Arctic sea ice based on the Special Report on 
Emissions Scenarios (SRES)'* A1B ‘business as usual’ greenhouse gas 
emissions scenario. Recent emissions trends make it clear that without 
mitigation little departure from the 2007 polar bear projections could be 
expected””. Also, the hypothesis that the climate system contains tipping 


elements* means that habitats supporting cold-dependent species could 
disappear abruptly and irreversibly when a particular global mean sur- 
face air temperature (GMAT) is exceeded*. It has been proposed” that 
existing greenhouse gas emissions already have committed the earth to 
temperatures that will rise above the tipping point for loss of perennial 
Arctic sea ice. The perception that nothing can be done to avoid cata- 
strophic losses and ultimate disappearance of polar bears was exempli- 
fied in 2007 when the general media proclaimed polar bears were 
irreversibly doomed”. 

We used projections of twenty-first century GMAT and sea-ice extent 
from the Community Climate System Model version 3 (CCSM3)’ to test 
the hypothesis that a tipping point* ** will lead to irreversible loss of sea- 
ice habitats as GMAT increases. We used a Bayesian network model’ to 
evaluate whether mitigating greenhouse gas rise could improve the 
future outlook for polar bears compared to previous projections. 

CCSM3 simulations were forced with greenhouse gas concentra- 
tions from five emissions scenarios (Supplementary Table 1): SRES“® 
A1B and B1; the 2000 (Y2K) climate change commitment scenario”; 
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Figure 1 | Changes from the present in polar bear habitat features varied 
greatly among greenhouse gas scenarios. a—d, The DIV is illustrated here. 
Shown are changes in optimal polar bear foraging habitat (a), extent of sea ice 
over continental shelves (b), number of months continental shelves are ice free 
(c) and the distance from the shelf edge to the edge of the perennial pack ice as 
projected by CCSM3 with four greenhouse gas scenarios (defined in text) 

(d). Thin lines plot annual averages of the model runs under each greenhouse 
gas scenario, with error bars showing data + 1 s.d. Bold lines are 10-year 
centred running averages of the annual mean values. OBS is observed passive 
microwave satellite data, black dots are the annual satellite observed values. 


lys Geological Survey, Alaska Science Center, 4210 University Drive, Anchorage, Alaska 99508, USA. National Science Foundation, 4201 Wilson Blvd., Arlington, Virginia 22230, USA. 3US Geological 
Survey, Alaska Science Center, 3100 National Park Road, Juneau, Alaska 99801, USA. “USDA Forest Service, PNW Research Station, 620 SW Main St., Suite 400, Portland, Oregon 97205, USA. 
5Atmospheric Sciences, University of Washington, Seattle, Washington 98195, USA. ®National Center for Atmospheric Research, 1850 Table Mesa Dr, Boulder, Colorado 80305, USA. +Present address: 


Polar Bears International, 810 N. Wallace, Suite E, P.O. Box 3008, Bozeman, Montana 59772, USA. 


16 DECEMBER 2010 | VOL 468 | NATURE | 955 


©2010 Macmillan Publishers Limited. All rights reserved 


LETTER 


a 30 b 30 
=o S 20 
g 10 @ 10 
£ 0 g 0 
2-10 6 -10 
= e 
5 -20 5 -20 
o *% 
£-30 ® -30 
Oo Oo 
£-40 & -40 
P=] “ 
Qu @ -! 
&-50 @ -50 
-60 ® 60 
-70 -70 
co 00 05 10° 15 20 25 SO gg 00 
= 6 & 30 
£5 % 
@ = 25 
ec 4 o 
2 2 20 
Oo 3 o 
a S 
= 2 @ 15 
5 2 
£1 £ 1.0 
8 0 5 
oa g 0.5 
o 
= 2 
2 i 
Oo -2 #00 
s o 
Y 3 § 05 


0.5 1.0 1.5 2.0 2.5 
Global temperature change (°C) 


3.0 0.0 0.5 1.0 1.5 2.0 2.5 


Global temperature change (°C) 


3.0 


Figure 2 | Relationship between GMAT change and change in polar bear 
habitat features is essentially linear. a—d, The DIV is illustrated here. The 
optimal polar bear foraging habitat (a), extent of sea ice over continental shelves 
(b), number of months continental shelves are ice free (c) and the distance from 
the shelf edge to the edge of the perennial pack ice (d). Linear relationship 
between habitat and GMAT changes does not support the tipping-point 
hypothesis. Projections are from CCSM3 running four different greenhouse gas 
scenarios (defined in text). 


the Level 1 stabilization scenario (CCSP450) of the United States 
Climate Change Science Program’; and the alternative scenario 
(AS)*. We pooled the AS and CCSP450 realizations into a 5-run 
mitigation (MIT) ensemble. 

Reduced radiative forcing with greenhouse gas mitigation resulted 
in cooler temperatures, greater sea-ice retention (Supplementary Figs 3 
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Figure 3 | September sea-ice extent (50% concentration) recovers from a 
RILE in a 2020 greenhouse gas commitment realization. In the 2020 
commitment realization, which was integrated from the same initial state as the 
AIB reference realization, greenhouse gas concentrations followed the A1B 
scenario until 2020, and were fixed thereafter. RILEs occurred in both 
realizations during the decade of the 2020s. In contrast to the reference run (red 
line), the substantial sea-ice recovery in the 2020 commitment scenario (purple 
line) supports the concept that RILEs represent natural sea-ice variability 
superimposed on a secular warming-induced sea-ice decline, rather than 
tipping points. All lines represent 10-year running averages compiled from the 
annual data. 


and 4) and less change in important polar bear habitat features (Fig. 1). 
Importantly, the relationship between GMAT and projected habitat 
change was largely linear (Fig. 2). Even in September, the month of 
minimum ice cover, as GMAT increased sea ice and polar-bear-habitat 
availability smoothly decreased—regardless of the greenhouse gas 
scenario (Supplementary Figs 5 and 6). 

We rejected the null hypothesis that there is a tipping point*°* of 
perennial Arctic sea-ice collapse by our failure to find a critical tem- 
perature threshold in our GCM outcomes. Our model outcomes sup- 
port the alternative hypothesis that sea-ice thermodynamics can 
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Figure 4 | Future polar bear persistence varies among ecoregions and 
greenhouse gas scenarios. Bayesian network model projected outcomes 
(coloured bars) are shown for each of four greenhouse gas scenarios, four future 
decades, and four ecoregions. Although substantial risk of extirpation 
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continues for the SEA and DIV even with mitigation, increased levels of 
greenhouse gas mitigation improve the probability of future polar bear 
persistence in all ecoregions. In the x-axis legend, we refer to the decades of 
2020-2029, 2045-2054, 2070-2079 and 2090-2099 as years 25, 50, 75 and 95. 
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dominate and reduce the destabilizing effects of the ice-albedo feed- 
back on summer sea-ice cover®”>*, 

To test further for evidence of tipping-point behaviour, we com- 
pared rapid ice-loss events (RILEs)’”” in CCSM3 realizations using 
A1B"* greenhouse gas levels and levels from a 2020 commitment 
integration in which greenhouse gas concentrations followed A1B 
until 2020 and were fixed at 2020 levels thereafter. In the A1B reference 
run, a RILE occurred between 2020 and 2030, and September Arctic 
sea ice largely disappeared by mid-century (Fig. 3). If RILEs represent 
tipping-point behaviour, as suggested’, the 2020 commitment run 
should have shown either no RILE or the same kind of permanent 
ice loss following a RILE as the reference run—depending on whether 
the climate system in that realization crossed the tipping point. 

A RILE did occur in the 2020 commitment run. Instead of proceed- 
ing towards permanent ice loss as in the reference run, however, the 
RILE in the 2020 commitment run was followed by partial recovery 
and substantial retention of September sea-ice cover through the cen- 
tury (Fig. 3). Because the 2020 commitment run was integrated from 
the same 2020 initial state as the A1B reference, it experienced the 
same near-term natural variability, including a RILE during the 2020s. 
The 2020 commitment run did not proceed to an irreversible and 
unstoppable loss of remaining ice®, presumably because the long-term 
ice loss in CCSM3 is dictated by greenhouse gas radiative forcing and 
consequent global warming, which are substantially lower for the 2020 
commitment run than A1B. This outcome indicates that RILEs are 
caused by the increased volatility of a thinner and more sensitive sea- 
ice cover, rather than the sea ice crossing an albedo-induced threshold 
from which it cannot return’”*”’. 

The linear relationship between GMAT and sea-ice habitat change, 
and the return of sea ice after the RILE in our 2020 commitment 
experiment confirm that there is no tipping point*®* for summer 
Arctic sea ice in the CCSM3 climate model. We recognize that the 
absence of tipping points in a climate model does not guarantee that 
tipping-point behaviour will not occur in the real world. We recognize 
also that absence of tipping-point behaviour in one GCM does not 
necessarily mean that tipping points would not be present in other 
GCMs. Because sea-ice loss in CCSM3 is more sensitive to GMAT rise 
than other GCMs”, however, it provides an appropriate and import- 
ant platform to test the tipping-point hypothesis (Supplementary 
Information). If the most sensitive of GCMs to greenhouse gas forcing 
does not illustrate tipping-point behaviour, we would not expect such 
behaviour in other, less sensitive models. 

The finding that RILEs in model outcomes result from increased 
volatility of an ice cover that is progressively thinning because of 
warming temperatures—rather than tipping-point behaviour—is con- 
sistent with recently observed summer sea-ice declines. The sea-ice 
loss between September 2006 and September 2007, which was roughly 
equal to the entire loss of September ice extent between 1979 and 2006, 
encouraged speculation that a tipping point might have been crossed’. 
Yet, the 2008 and 2009 minima, although well below the long-term 
mean, were less severe than the record set in 2007". Major losses of 
summer sea ice can thus occur, both in models and in observations, 
without pushing the sea ice past a tipping point into a permanent state 
of ice-free summers®’°”®. Instead of tipping-point behaviour, recent 
observations and model outcomes illustrate great natural variability 
superimposed on a secular warming-induced sea-ice decline. 
Controlling temperature increase, therefore, is the key to preserving 
sea-ice habitat. 

We derived Bayesian network projections, informed by CCSM3 
habitat projections, for polar bear populations in four ecoregions 
(Supplementary Fig. 8). With A1B habitat values, polar bears were 
most likely to disappear from the Seasonal Ice Ecoregion (SEA) and 
Polar Basin Divergent Ice Ecoregion (DIV) by mid-century, and to be 
substantially reduced in the Archipelago Ecoregion (ARC) and the 
Polar Basin Convergent Ice Ecoregion (CON). With MIT habitat 
values, extinction probabilities were much lower in all ecoregions 
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Figure 5 | Greenhouse gas mitigation and best possible wildlife 
management could allow polar bears to persist throughout current range. 
Bayesian network outcomes with habitat inputs from the MIT scenario are 
shown for the last decade of the twenty-first century. When temperature rise is 
kept at or below the MIT scenario and when on-the-ground management of 
harvest, bear-human interactions, oil and gas activities etc. is maximized 
(influence run no. 2), extinction is not the most probable outcome in any of the 
four ecoregions. 


(Fig. 4). Contrary to the A1B case, when greenhouse gas mitigation 
was combined with best on-the-ground management practices (for 
example, controlling hunting and other interactions with humans) 
extinction was not the most probable outcome in any ecoregion, and 
future population sizes in the CON and ARC could be equivalent to or 
even larger than at present (Fig. 5). Greenhouse gas mitigation that 
keeps GMAT rise below 1.25 °C combined with traditional wildlife 
management could, it seems, maintain polar bear numbers at sustainable 
although lower-than-present levels throughout the century. (Supplemen- 
tary Information). 


METHODS SUMMARY 


Relationships between temperature and habitat. We evaluated relationships 
between GMAT change and four habitat variables important to polar bear foraging 
success: resource-selection-function-based optimal habitat’; the temporal and spa- 
tial extent of sea ice over shallow continental shelf waters”'*'’; and the distance ice 
retreated from the continental shelf. GMAT change was calculated as the differ- 
ence between the mean temperature of 1980-1999 (13.67 °C), and the future 
temperatures projected by CCSM3 under the different greenhouse gas scenarios. 
Effects of habitat alteration on polar bears. We projected the effects of habitat 
alteration on polar bear persistence with a Bayesian network model’ modified to 
include inputs from other subject matter experts. Our Bayesian network model 
incorporated changes in four habitat variables projected for each of four ecore- 
gions (Supplementary Fig. 8), with four greenhouse gas scenarios. The Bayesian 
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network model also was informed by the broad range of other currently available 
information including: potential anthropogenic stressors; the established links 
between reduced physical stature and survival and declining sea-ice availability 
among polar bears in parts of their range”’>"'’; qualitative information indicating 
that similar processes are underway in parts of the polar bear range where quant- 
itative data are not yet available; the fact that polar bears ultimately are dependent 
on the sea ice’*"* for consistent foraging success; and knowledge that if green- 
house-gas-induced warming continues to increase, essential polar bear sea-ice 
habitats ultimately will disappear’’. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


GCM and scenarios. We used five emissions scenarios in our CCSM3 experi- 
ments. The Y2K scenario fixes atmospheric greenhouse gas concentrations at year 
2000 levels**. The CCSP450”* scenario keeps end of century total anthropogenic 
radiative forcing below 3.4 W m7, whereas the AS does not allow anthropogenic 
radiative forcing to exceed 1.5 W m “above year 2000 levels. In A1B and B1, CO, 
rises to 689 p.p.m. and 537 p.p.m. by 2100 (using the CCSM3 concentration values, 
see Supplementary Fig. 1 and Supplementary Table 1). Greenhouse gas concen- 
trations for the CCSP450°* and SRES’® scenarios were calculated from the emis- 
sions specified for these scenarios with the Model for the Assessment of 
Greenhouse Gas Induced Climate Change”, a globally averaged gas-cycle/climate 
model. See ref. 12 for discussion of the CCSP greenhouse gas concentrations, and 
ref. 31 for details of the SRES’* integrations. Greenhouse gas concentrations used 
in the AS are in supporting table 2 found at http://www.pnas.org/content/101/46/ 
16109/suppl/DC1. 

We obtained eight realizations each for A1B and B1, four realizations each for 
CCSP450 and Y2K, and one realization of the AS. Because net radiative forcing in 
the AS and CCSP450 were similar (Supplementary Fig. 2), and because global 
temperature change (Supplementary Fig. 3) and change in sea-ice extent 
(Supplementary Fig. 4) projected by the single AS run were very similar to mem- 
bers of the 4-run ensemble of CCSP450, we combined the single AS run with the 4 
CCSP450 runs to create a 5-run mitigation ensemble (MIT). This left us with 4 
forcing ensembles with which to compare the projected effects on the future 
welfare of polar bears: A1B, B1, Y2K and MIT. 

GMAT change was calculated as the difference between the annual mean tem- 
perature of 1980-1999 (13.67 °C), and the future temperatures projected by 
CCSM3 under the different greenhouse gas scenarios we examined. We derived 
the 1980-1999 mean from 8 CCSM3 model runs incorporating greenhouse gas 
increases observed through the twentieth century (20C3M ensemble)”. 
Ecoregions. We evaluated how mitigation might affect polar bears occupying four 
Arctic ecoregions defined by temporal and spatial differences in observed ice melt, 
freeze, advection, bathymetry, proximity to land, and polar bear responses to those 
patterns (Supplementary Fig. 8). Each ecoregion is large, composed of several 
recognized subdivisions of the global polar bear population*”’, and not entirely 
homogeneous. Nonetheless, they offer useful subdivisions of the worldwide polar 
bear distribution because areas within each tend to be more similar than they are to 
portions of other ecoregions. 

The SEA includes Hudson Bay, Foxe Basin, Baffin Bay and Davis Strait. There, 

sea ice melts entirely in summer and the ~7,500 bears occurring there are forced 
ashore for extended periods during which they are largely food deprived. The 
ARC—the channels between the Canadian Arctic Islands—is presently home to 
~5,000 bears and is characterized by heavy sea ice, much of which is present year 
round. The polar basin (the portion of the Arctic Ocean centred on the North Pole 
and ringed by the continental shelves of Eurasia, North America, Greenland and 
the Canadian Archipelago; Supplementary Fig. 8) was divided into a DIV, includ- 
ing the Southern Beaufort, Chukchi, East Siberian-Laptev, Kara and Barents Seas, 
and a CON including the east Greenland Sea, the continental shelf areas adjacent 
to northern Greenland and the Queen Elizabeth Islands, and the northern 
Beaufort Sea. Extensive formation of annual sea ice occurs in the DIV where 
~8,500 bears currently occur. That ice typically is advected towards the central 
polar basin, out of the polar basin through Fram Strait, or against the CON. The 
CON is currently home to ~2,400 polar bears. Differences among ecoregions 
acknowledge that global warming effects on sea-ice habitats have different starting 
points'* and that the nature of sea-ice changes is likely to be different. 
Habitat metrics. We examined the relationship between GMAT change and four 
habitat variables known to be important to polar bears. First, we adopted the 
resource selection function (RSF) approach previously described’ to convert 
GCM projections of sea-ice extent to projections of optimal polar bear habitat. 
RSFs are quantitative expressions of the habitats animals choose to utilize, relative 
to available habitats and resources”. Sea-ice concentrations for the observational 
period were estimated from monthly passive-microwave (PMW) satellite 
imagery”. Choices polar bears made from among available habitats were deter- 
mined from 1985-1995 satellite radiolocations'. Optimal habitat was defined as 
any mapped pixel with an RSF value in the upper 20% of the seasonally averaged 
(1985-1995) RSF scores, and could be expressed as the sum of qualifying mapped 
pixels over any period of interest. We assessed changes in habitat availability by 
comparing annual sums of optimal habitat among projected time periods’. 

Estimates of optimal habitat were limited to the polar basin because only there 
did we have access to the radio-tracking data necessary to build RSF models. The 
importance of sea ice over continental shelves, however, is widely recognized as an 
important component of polar bear habitat'’*. Therefore, we derived a second 
habitat variable we called ‘total shelf-ice habitat’ from both observed and projected 
Arctic-wide sea-ice concentration maps. Total shelf-ice habitat was defined as the 
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aerial cover (km/) of all pixels with =50% ice concentration that were mapped 
over the continental shelves (<300 m depth). Waters with less than 50% ice cover 
were denoted ice-free because available data indicate that areas with sea-ice 
coverage <50% may not be preferred’. Unlike optimal habitat, total shelf-ice 
habitat could be calculated in all ecoregions and therefore provided a means of 
quantifying projected changes in habitat availability throughout the range of polar 
bears. We compared shelf-ice habitat expressed as the annual 12-month sum of sea- 
ice extent over the continental shelves in each ecoregion. Because SEA and ARC are 
almost entirely continental shelf area, the total shelf-ice habitat in those ecoregions 
equated to the total annual area (sum of 12 months) of =50% concentration sea ice. 

The third habitat variable, one of the most important variables representing 
seasonal changes in habitat available to polar bears*'*'”°*, was calculated as the 
change from present in the number of months that ice was projected to be absent 
(ice-free months) from the continental shelves. An ice-free month occurred in an 
ecoregion when <50% of the shelf area was covered by sea ice of [50% concen- 
tration. Outside the polar basin this variable represented simply the ice-free season 
because the SEA and ARC are composed almost entirely of continental shelf. 

Recognizing that the magnitude of the separation of the sea ice from preferred 
foraging areas also might be important, we calculated a fourth habitat variable as 
the change in average distance from the continental shelf to the ice pack during the 
month of minimum ice extent (shelf-to-ice distance). Shelf-to-ice distance was 
calculated, for the month of minimum ice extent, as the mean distance from every 
shelf pixel in either of the polar basin ecoregions to the nearest ice-covered pixel 
(>50% concentration) in the main body of perennial ice. We did not calculate 
shelf-to-ice distance in SEA and ARC because they are almost entirely comprised 
of continental shelf. 

We plotted GMAT change against these habitat features to evaluate potential 

nonlinearities in the relationships. Figure 2 and Supplementary Figs 5 and 6 illus- 
trate annual mean GMAT values (x-axis) and corresponding habitat values (y-axis) 
for each year of each simulation (small dots). Each scenario is shown in a different 
colour. Large connected dots in each plot are centred on the means, over all years, of 
the annual GMAT values and values of the habitat-related variables, where GMAT 
lies within 0.25 °C bins centred on 0.25 °C, 0.5 °C, 0.75 °C, etc., for all simulations 
performed for each scenario. Large dots are not in exact vertical alignment because 
the means of the GMAT values in each bin differ among scenarios. 
Bayesian network model. The effects of future habitat alteration on probabilities 
of future polar bear persistence were projected with a beta version” of the Bayesian 
network model used previously’. The beta model was reviewed by two other polar 
bear experts and modified accordingly. Some conditional probabilities were modi- 
fied to incorporate reviewers’ suggestions and observations noted since building 
the original model. The beta model includes a finer division of bins for sea-ice 
habitat variables, but upper and lower bounds were retained to ensure that the 
range of possible entries in conditional probability tables was consistent with the 
assignments in ref. 3. The final structure (nodes and links) of the beta model is 
nearly identical to that of the alpha model’. 

Our beta model incorporated changes in the four habitat variables projected 
under different greenhouse gas scenarios. We calculated the average per cent of 
future changes, from the 2001-2010 decade, in annual optimal and shelf-ice 
habitat. Changes in the number of ice-free months and the shelf-to-ice distance 
were expressed as the average increases (months of ice absence and kilometres of 
ice retreat) at each decade. 

The Bayesian network model also was informed by the broad range of other 
currently available information including: potential anthropogenic stressors; the 
established links between reduced physical stature and survival and declining sea- 
ice availability among polar bears in parts of their range”'> '””*; available qualitative 
information indicating that similar processes are underway in parts of the polar bear 
range where quantitative data are not yet available; the fact that polar bears ulti- 
mately are dependent on the sea ice’*”* for consistent foraging success; and that if 
greenhouse-gas-induced warming continues to increase, essential polar bear sea-ice 
habitats ultimately will disappear’. These additional factors were incorporated into 
the model as ordinal or qualitative categories or as background with which con- 
ditional probability tables were parameterized. The beta model incorporated 4 
greenhouse gas scenarios and was applied to each of the four ecoregions at four 
future decadal time periods: 2020-2029, 2045-2054, 2070-2079 and 2090-2099. At 
each time period, states of these variables could represent a condition similar to 
present, better than present, or worse than present (see tables 3 and 4 in ref. 3). 
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Intercalation of a new tier of transcription regulation 


into an ancient circuit 


Lauren N. Booth', Brian B. Tuch'+ & Alexander D. Johnson! 


Changes in gene regulatory networks are a major source of evolu- 
tionary novelty'*. Here we describe a specific type of network 
rewiring event, one that intercalates a new level of transcriptional 
control into an ancient circuit. We deduce that, over evolutionary 
time, the direct ancestral connections between a regulator and its 
target genes were broken and replaced by indirect connections, 
preserving the overall logic of the ancestral circuit but producing 
a new behaviour. The example was uncovered through a series 
of experiments in three ascomycete yeasts: the bakers’ yeast 
Saccharomyces cerevisiae, the dairy yeast Kluyveromyces lactis 
and the human pathogen Candida albicans. All three species have 
three cell types: two mating-competent cell forms (a and a) and the 
product of their mating (a/a), which is mating-incompetent. In the 
ancestral mating circuit, two homeodomain proteins, Matal and 
Mata2, form a heterodimer that directly represses four genes that 
are expressed only in a and a cells and are required for mating*®. In 
a relatively recent ancestor of K. Jactis, a reorganization occurred. 
The Matal-Mata2 heterodimer represses the same four genes 
(knownas the core haploid-specific genes) but now does so indirectly 
through an intermediate regulatory protein, Rme1. The overall logic 
of the ancestral circuit is preserved (haploid-specific genes ON in a 
and @ cells and OFF in a/a. cells), but a new phenotype was produced 
by the rewiring: unlike S. cerevisiae and C. albicans, K. lactis inte- 
grates nutritional signals, by means of Rmel, into the decision of 
whether or not to mate. 

In S. cerevisiae, K. lactis and C. albicans, three cell types (a, « and 
a/a) are specified by transcriptional regulators (sequence-specific 
DNA-binding proteins) encoded at the mating type locus. An import- 
ant part of this cell-type-specific circuit is the regulation of the haploid- 
specific genes (hsgs), a group of genes that are expressed in a and «& cells 
but not in a/a cells’. The full sets of hsgs were previously identified in 
S. cerevisiae’ and C. albicans (ref. 5, and B.B.T., Q. M. Mitrovich, F. M. 
De La Vega, C. K. Monighetti and A.D.J., unpublished observations) 
but not in the related species K. lactis. To examine the evolution of this 
portion of the mating circuit, we identified the genes in the K. lactis hsg 
regulon and compared them to those in S. cerevisiae and C. albicans. By 
profiling the expression patterns of wild-type a, « and a/« K. lactis cells 
genome-wide, we identified 12 genes that are clear hsgs under the 
conditions tested (Fig. la), two of which—RME] (referred to previ- 
ously as MTS1) and STE4—were previously identified as hsgs in 
K. lactis*. Comparison of all the hsgs in the three species revealed a 
substantial level of turnover in the regulon; in other words, an hsg in 
one species is not necessarily an hsg in the other two (Fig. 1b). 
However, an ancestral core of four hsgs (GPA1, STE4, STE18 and 
FARI) share a common expression pattern in all three species. The 
first three genes encode the heterotrimeric G protein that, in the pres- 
ence of mating pheromone, activates a downstream mitogen-activated 
protein kinase (MAPK) cascade required for mating’’. Far1 lies 
further downstream in this pathway and mediates two responses 
needed as a prelude to mating, cell cycle arrest'* and the formation 
of mating projections'*"*, 


In S. cerevisiae and C. albicans a/c cells, all four genes of the hsg core 
regulon are directly repressed by the transcription regulator al-o2 
(refs 4, 6), a heterodimer encoded by one gene at the MATa locus 
and one gene at the MAT« locus. To determine whether this was also 
true in K. lactis, genome-wide chromatin immunoprecipitations 
(ChIP-chip) of al and «2 were performed in K. lactis a/a cells. In total, 
the upstream regions of 14 genes were observed to be occupied by both 
al and «2, including the RMEI gene (Fig. 1c), which is also al-2 
regulated in S. cerevisiae. al and «2 ChIP peaks were not observed at 
the promoters of any of the four core hsgs in K. lactis, indicating that, 
unlike in S. cerevisiae and C. albicans, these genes are not directly 
regulated by al-a2. 

To confirm the absence of direct al—-o2 regulation at K. lactis hsgs, we 
identified the al-c2 recognition motif in K. lactis from the ChIP data, 
using a de novo motif-finding program’’. The highest-scoring motif was 
similar to the al-c2 motifs previously identified in S. cerevisiae’® and 
C. albicans"’ (Fig. 1d). Indeed, the S. cerevisiae motif is efficiently recog- 
nized by the C. albicans al-o12 protein"’, confirming that key features of 
this sequence have remained largely unchanged in the three species. We 
searched the regions 2 kilobases upstream of each K. lactis core hsg for 
the K. lactis al-a2 motif but did not find significant matches, confirm- 
ing the absence of direct al—-o.2 regulation of these genes. (Whereas the 
al-a2 site upstream of RME1 had a log; -odds score of 4.98, the best 
matches at the core hsgs ranged from —0.70 to 0.93.) These results 
indicate that although the ancestral core hsg expression pattern is con- 
served in K. lactis, the mechanism of the regulation has changed. 

To understand how the K. lactis hsgs are cell-type regulated we 
searched the upstream regions of the 12 genes identified as hsgs by 
expression array (Fig. 1a) for cis-regulatory motifs'*. The second-highest 
ranking motif (the top-ranking motif was a repeat sequence) was found 
in 11 out of 12 of the promoters (Fig. 2a) and was similar to the 
S. cerevisiae Rmel motif (K. lactis consensus, GAACCNMAA; 
S. cerevisiae consensus, GAACCTCAA”””). This motif is also similar 
to, although longer than, the K. lactis Rme1 motif derived previously”. 
The Rmel motif is absent from S. cerevisiae and C. albicans hsg 
promoters. 

In S. cerevisiae, Rme1 was initially identified as a repressor of meiosis 
and sporulation’'”’, and was later shown to act as a transcriptional 
activator of other genes’. In K. lactis, Rmel was shown to regulate 
mating-type interconversion”® (the switching of a and « cells to the 
opposite cell-type by means of DNA rearrangement). We speculated that 
Rmel was co-opted in the K. lactis lineage to positively regulate the core 
hsgs. 

To test this hypothesis, we knocked out the RME1 gene in K. lactis a 
cells and examined the gene expression profile by microarray. We found 
that 20 genes were downregulated in the knockout strain (Fig. 2b), 
including all four of the core hsgs (P = 2 X 10 '°, hypergeometric 
distribution). We also observed a set of genes that was upregulated in 
the absence of Rmel (Supplementary Fig. 1), including a signifi- 
cant number of genes orthologous to S. cerevisiae sporulation genes 
(Gene Ontology (GO): 0043934, n = 14, P = 10° "4, hypergeometric 
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Figure 1 | The core hsgs are not directly regulated by al-a2 in K. lactis. 

a, The expression profiles of the set of 12 hsgs identified in K. lactis. Note that 
phosphate starvation induces expression of the hsgs and is required to identify 
these genes. For example, when starved for phosphate, the heterotrimeric G 
protein genes are expressed in a and « cells at levels about fivefold higher than in 
a/c cells. b, A comparison of hsgs defined by transcriptional profiling in S. 
cerevisiae (Sc)*, K. lactis (Kl) (panel a) and C. albicans (Ca) (ref. 5, and B.B.T., Q. 
M. Mitrovich, F. M. De La Vega, C. K. Monighetti and A.D.J., unpublished 
observations) shows a conserved subset of hsgs (GPA1, STE4, STE18, FAR1), 


distribution) indicating that the function of Rmel in regulating meiosis 
is shared with S. cerevisiae. Thus, whereas some of the targets of Rmel 
have remained the same as in the common ancestor of S. cerevisiae and 
K. lactis, Rmel1 has gained new targets in the K. lactis lineage, including 
the core hsgs. 

To test whether Rmel directly regulates the core hsgs in K. lactis, we 
performed a genome-wide ChIP of Rmel in a cells. ChIP peaks were 
observed at the promoters of the four core hsgs (Fig. 2c) and were 
centred over the Rmel motifs (Fig. 2c). Thus, in the K. lactis lineage, 
Rmel was gained as a direct activator of the core hsgs by the acquisi- 
tion of Rmel cis-regulatory sequences at all four genes. We note that 
Rmel is not the only regulator of the K. lactis hsgs; for example, STE18 
is repressed by Sir2 (ref. 23). 

We next tested the biological role of Rmel in mating in K. lactis, 
S. cerevisiae and C. albicans by comparing wild-type and RME1 knock- 
out a cells. In response to « pheromone, a cells form mating projec- 
tions (polarized growth towards the source of pheromone). When 
S. cerevisiae and C. albicans wild-type and Arme] a cells were exposed 
to « mating pheromone, both strains formed mating projections 
normally (Fig. 2d). In contrast, whereas K. lactis wild-type a cells 
produced mating projections in response to pheromone, Arme]1 a cells 
did not, indicating that this biological response was dependent on 
Rmel (Fig. 2d). As a second test of the role of Rmel, we examined 
mating directly using a quantitative mating assay. No difference was 
observed between the mating efficiencies of wild-type a cells and those 
of Arme1 a cells for S. cerevisiae and C. albicans (Fig. 2e). In contrast, 
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which we refer to as the core hsgs (bold in a and b). c, ChIP enrichment profiles 
from experiments using haemagglutinin (HA)-tagged MATa1 a/a cells 
(magenta), HA-tagged MAT«2 a/c cells (blue) and, as a control, untagged 
a/a cells (green). The ChIP enrichment was determined by hybridization to a 
tiling microarray. The location of the al-a2 motif in the RME1 promoter is 
indicated by the orange star. The genes (tan boxes) are all transcribed in the 
reverse direction. Data were visualized with MochiView”. d, The K. lactis al- 
«2 motif determined from the ChIP-chip data. For comparison, the S. cerevisiae 
and C. albicans motifs (derived from published ChIP data*®) are also shown. 


the K. lactis Arme1 a cell mating efficiency was decreased, relative to 
the wild type, by a factor of at least 10° (Fig. 2e). Thus, the ability to 
mate is critically dependent on Rmel—but only in K. lactis. 

Unlike S. cerevisiae and C. albicans, K. lactis requires a starvation 
signal to mate and to respond to pheromone”. Although several 
different types of starvation signal can prime K. lactis to respond to 
pheromone**”’, we found that phosphate starvation is particularly 
potent, and it was used in subsequent experiments. Our expression 
profiling experiments (Fig. la) revealed that K. lactis requires star- 
vation to express most of its mating genes. RMEI was also highly 
induced (24-fold) by phosphate starvation (Fig. la). We note that 
S. cerevisiae RME1 transcript levels also increase tenfold under star- 
vation conditions'’, suggesting that regulation of RME1 by starvation 
may be ancestral to S. cerevisiae and K. lactis. 

We next investigated in greater detail how the starvation signal is 
incorporated in the K. lactis mating regulatory circuit. The simplest 
model consistent with the data presented so far is that starvation 
upregulates RME1, which in turn activates transcription of the hsgs. 
A prediction of this model is that ectopic expression of RMEI in 
K. lactis should override the requirement for starvation in expressing 
the hsgs. We created an a strain overexpressing RME1 to levels that 
were within tenfold of the level in starved wild-type cells (using the Kl 
LAC4 promoter) and found that overexpression of RME1 (in rich 
YEP-galactose medium) was sufficient to induce expression of the 
heterotrimeric G protein subunits (Fig. 3a). Overexpression of 
RME1 is also sufficient to allow K. lactis to form mating projections 
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Figure 2 | RME1 is a direct activator of hsg expression and is required for K. 
lactis mating. a, The K. lactis Rme1 motif found by a de novo search’ of the 12 
Kl hsgs and the S. cerevisiae motif derived from two experimentally 
characterized binding sites'*’”. b, The set of 19 genes repressed twofold or 
greater relative to wild type when RME] is absent and the cells are phosphate- 
starved. In bold are the core hsgs. c, Rmel is a direct regulator of the core hsgs. 
ChIP of Rmel was performed in K. lactis c-Myc-tagged RME1 a cells (blue and 
green lines, two biological replicates) and untagged, control a cells (orange line). 
The immunoprecipitated DNA was hybridized to a tiling microarray. The 
genes (tan boxes) above the line are transcribed in the forward direction and 
those below are transcribed in the reverse direction. The location of the Kl 
Rmel motif is indicated by a purple star. d, Rmel is required only in K. lactis to 


in response to pheromone in rich medium (Fig. 3b). These results 
strongly support the model by showing that upregulation of RME1 
is sufficient to cause biologically relevant upregulation of the hetero- 
trimeric G proteins. 

Thus, the rewiring of the K. lactis hsg circuit (summarized in Fig. 4) 
resulted in a new network configuration and a novel phenotype, rela- 
tive to the ancestor. Our results suggest a possible evolutionary path for 
this rewiring. In the ancestor of all three yeasts, the hsgs were directly 
repressed by al-o2. Either in an ancestor to S. cerevisiae and K. lactis or 
independently in each lineage, RMEI was brought under nutritional 
regulation. Finally, in the K. lactis lineage alone, two steps occurred: the 
hsgs lost the cis-regulatory sequences for al-a2 and gained the cis- 
regulatory sequences for Rmel1. As described in Supplementary Fig. 2, 
it is possible to determine more precisely when the rewiring of the core 
hsgs occurred. We can infer that direct al-«2 regulation of the core 
hsgs was probably lost several times in the ascomycete lineage, and that 
the K. lactis form of regulation of the hsgs probably arose after K. lactis 


respond to mating pheromone. Wild-type or RME1 knockout a cells were 
exposed to either « mating pheromone or a mock treatment of 
dimethylsulphoxide (DMSO). Mating projections form readily when both 
wild-type and Armel S. cerevisiae and C. albicans cells are exposed to mating 
pheromone. Only the K. lactis Arme1 a cells were unable to respond to the 
presence of « mating pheromone. e, Rmel is required for mating in K. lactis but 
not in S. cerevisiae nor C. albicans. Quantitative mating assays were performed 
by mating wild-type or Arme1 a cells to wild-type « cells. In S. cerevisiae and C. 
albicans the percentage of a cells that was able to mate is similar for wild-type 
and Armel. K. lactis Arme1 a cells are mating incompetent; no mating products 
were isolated from the Arme1 a X wild-type « mating. 


and the closely related species L. kluyveri branched from their common 
ancestor. 

Although we do not know whether acquisition of the K. lactis form 
of regulation was adaptive, this type of regulation makes logical sense 
given that the primary mode of growth of K. lactis is as a haploid’®. The 
formation of spores is a strategy employed by many yeasts to survive 
harsh environments. For starvation to give rise to spores, K. lactis 
would first have to mate (to form the sporulation-competent a/« cell 
type), thus rationalizing the link between starvation and mating. In 
contrast, S. cerevisiae in the wild is typically at least diploid’’ and forms 
spores directly in response to starvation. Thus, the coupling of mating 
and starvation makes conceptual sense for K. lactis in comparison with 
S. cerevisiae. 

We have described a case in which a new tier of regulation has been 
intercalated into an ancient transcription circuit consisting of a regulator 
(a homeodomain heterodimer) and a set of target genes. This change 
involved breaking the original connections between the regulator and its 
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Figure 3 | Overexpression of RME1 is sufficient for hsg expression in the 
absence of nutrient starvation. a, In the overexpression strain (pLAC4- 
RME1), RMEI transcription is induced by galactose-containing medium, a 
condition that does not cause expression of the heterotrimeric G proteins or 
pheromone response in wild-type (WT) cells. A strain using the empty pLAC4 
vector was used as a control. The transcripts were measured relative to ACT1 
transcript levels by RT-quantitative PCR (means and s.d., n = 3). In the 
absence of a starvation signal the hsgs, but not CKB1 (a non-hsg control), are 
upregulated when RME1 is overexpressed. b, RME1 overexpression allows cells 
to respond to mating pheromone in the absence of a starvation signal. K. lactis 
a cells that contained only the endogenous RME1 copy and an empty pLAC4 
vector (WT), the endogenous copy of RMEI and RME1 driven by the pLAC4 
promoter (WT + pLAC4-RME1) or only RME1 driven by the pLAC4 
promoter (Armel + pLAC4-RME1) were grown in YEP-galactose and exposed 
to % mating pheromone. Wild-type cells were unable to form mating 
projections in the absence ofa starvation signal, but both strains overexpressing 
RMEI (pLAC4-RME1) formed mating projections in the absence of a 
starvation signal. 


69 om 
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Figure 4 | A simplified model for the evolution of regulation of core hsgs in 
three yeasts. In all three species the core hsgs are repressed by al—a2; thus, they 
are ON ina and « cells and OFF in a/o cells. In S. cerevisiae and C. albicans the 
repression is direct (al-o2 binds to the promoters of these genes), but in K. 
lactis it is indirect, through Rmel. The circuit rewiring in the K. lactis lineage 
has resulted in a new mating behaviour; this species is able to mate only when 
starved. We show that this behaviour is due to the intercalation of Rmel, which 
is upregulated by starvation in K. lactis. 


962 | NATURE | VOL 468 | 16 DECEMBER 2010 


target genes and replacing them with a more complex type of hierarchy 
(Fig. 4). Intercalation may be a common way in which regulatory circuits 
evolve. This type of ‘intercalary evolution’ was first proposed’* to 
account for a common origin of eyes. In a wide variety of species, the 
transcription regulator Pax6 lies at the top of the eye development 
hierarchy, and rhodopsins occupy the bottom. According to the pro- 
posal, different types of eye arose from evolutionary intercalation of a 
variety of regulatory and structural genes within this simple, deeply 
conserved, regulatory relationship. The change we describe here is less 
complex and provides a concrete example of evolutionary intercalation, 
one that is responsible for an important feature of modern mating 
behaviour in K. lactis. It has been known for decades that K. lactis (unlike 
its relatives S. cerevisiae and C. albicans) requires starvation to mate**, 
and we have shown that this behaviour is due to the new configuration of 
the K. lactis mating circuit. 


METHODS SUMMARY 


Gene expression array. RNA was isolated by hot phenol extraction and reverse 
transcribed, and the resulting complementary DNAs were coupled to Cy5. A 
pooled mixture of the cDNAs was coupled to Cy3 and used as a reference. 
Labelled cDNAs were hybridized to Agilent arrays for visualization. 

ChIP-chip. ChIP experiments were performed as described previously”, with 
minor modifications. 

Pheromone response assays. Exponential-phase cultures were exposed to 13-mer 
a-mating pheromone, and the formation of mating projections was monitored by 
microscopy. 

Quantitative mating assay. a and « cultures were grown independently to expo- 
nential phase and then mixed together with o cells in fivefold excess under mating 
conditions. The mating products were selected for on medium that either a and a/a 
cells or only a/a cells could grow on, and efficiencies were calculated as efficiency 
= (a/a colonies)/(a and a/« colonies). 

RT-quantitative PCR. Cultures were grown to exponential phase in YEP-galactose 
medium, and RNA was isolated by extraction with hot phenol. RNA was reverse 
transcribed and the cDNAs were quantified by quantitative PCR. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Medium. Details of the medium used in the experiments presented here can be 
found in ref. 31. The recipe for the phosphate starvation medium can be found in 
ref. 25. 

Strains and strain construction. The strains used in this study can be found in 
Supplementary Information. S. cerevisiae strains are S288C background and 
C. albicans strains are SC5314 background. 

Gene disruption cassettes for knockouts and taggings in K. lactis and L. kluyveri 
were generated by fusion PCR*® using the primers listed in Supplementary 
Information. Fusion PCRs were performed in a 50-11 reaction containing 0.5 ul 
ExTaq (Takara Bio Inc.), 0.25 mM dNTPs, 0.2 1M each primer and about 25 ng of 
template. The reactions were incubated as follows: 94 °C for 3 min; 35 cycles of 94 
°C for 30 s, 50-55 °C (depending on primer) for 30 s and 72 °C for 1 min per 
kilobase; and 72 °C for 5 min. The first round of PCR consisted of three reactions 
that amplified the flanking homologous sequence from K. lactis genomic DNA 
with primers 1 and 3 or 4 and 6, and amplified the markers from the appropriate 
plasmids with primers 2 and 5. The URA3 marker was amplified from YEp24, the 
TRP1 marker from YEplac112, the c-Myc tagging cassette from pFA6a-13Myc- 
kanMX6 (ref. 33) and the 3 X HA tagging cassette from pYMN-20 (ref. 34). The 
products were purified with the QlAquick PCR Purification Kit (Qiagen). The 
second round of amplification (the fusion round) used 1 pl of each purified flank 
PCR product and 2 ul of the purified marker PCR product. This product was 
purified with the QIAquick PCR Purification Kit. 

The purified fusion PCR products were transformed into K. lactis and L. kluyveri 
by electroporation**”*. Transformants were confirmed to be correct by colony PCR 
with the check primers listed in Supplementary Information. Tagged genes were 
also verified by sequencing. 

To avoid mating-type switching in K. lactis and to increase the efficiency of 
tagging genes in the MAT loci, we created strains in which the silent MAT loci 
(HMLa and HMRa) were knocked out. SAY509 and SAY572 were transformed 
with gene disruption cassettes for HMRa (yLB10b) and HML« (yLB11b), respec- 
tively. To generate strains with both silent cassettes deleted, these strains were 
mated as described” for 2 days, and diploids were selected for by URA/TRP drop- 
out SCD medium. The resulting strain (yLB12d) was sporulated on pre-SPO plates 
for 2 days, and haploids with both silent cassettes deleted were selected for on 
URA/TRP drop-out medium. The mating types were determined by colony PCR 
using the MAT, MATa and MAT« check primers. These strains (yLB13a and 
yLB14) were mated to generate an a/« strain lacking silent MAT loci (yLB15b). 

To create N-terminally 3 x HA-tagged MATal and MAT«2 for ChIP experi- 
ments, yLB13a and yLB14 were transformed with the gene disruption cassettes 
created with the HA primers and the MATal-tag or MAT«-tag primers. yLB55 
was mated with yLB14 to create yLB58, the a/a strain used in the MATal ChIP 
experiment. yLB56a was mated with yLB13a to create yLB57a, the a/x strain used 
in the MATo2 ChIP experiment. 

RMEI was knocked-out (yLB21a) and tagged (yLB54b) in SAY572. 

yDG957 is the mated product of SAY509 and SAY572. 

yLB33al was created by sorbose-selecting TF028X, as described in ref. 37. 
AH136al was sorbose-selected from SN87, and HIS1 and LEU2 were added back 
as described in ref. 38. 

pLAC4 (described below) was transformed into SAY572 after digestion with 
SaclI (New England Biolabs) to generate yLB61b. pLAC4-RME1 (described 
below) was transformed into SAY 572 and yLB21a to create yLB64a and yLB65, 
respectively. 

pLAC4 was created by modifying pKLAC1 (New England Biolabs) as follows. 
The pKLACI vector was cut with HindIII and Xhol and gel-purified with the 
QIAquick Gel Extraction Kit to remove most of the «-mating pheromone secretory 
domain. The vector was dephosphorylated with APEX heat-labile dephosphatase 
(Epicentre), in accordance with the kit’s instructions. Primers BamHI add and 
BamHI add, reverse complement were hybridized to each other at a concentration 
of 40 1M in 1 X T4 PNK buffer (New England Biolabs) and 40 mM additional NaCl 
by incubation at 94 °C for 2 min and slow cooling to 10 °C ata rate of 0.1°Cs_'. The 
hybridized primers created sticky ends for HindIII and Xhol cut sites. The hybri- 
dized primers were phosphorylated by adding ATP to a final concentration of 1 
mM and T4 PNK kinase (New England Biolabs) to a final concentration of 200 U 
ml’ and incubating them at 37 °C for 10 min. The cut, dephosphorylated vector 
was ligated to the hybridized primers by using Fast-Link Ligase (Epicentre) at a 1:5 
molar ratio and transformed into DH5« cells. The acetimidase marker in the 
plasmid was then replaced with kanMX6. kanMX6 was amplified from pFA6a- 
13myc-kanMX6 (ref. 33) by using the KAN primers. The plasmid and kanMX6 
marker were cut with BsrGI and Xmal (New England Biolabs) and ligated and 
transformed as described above. This plasmid was used as an empty vector control 
(referred to as pLAC4) in the RME1 overexpression experiments. 


pLAC4-RME1I was generated by amplifying RME1 from K. lactis genomic DNA 
with the primers RME1 + BglIland RME1 + NofI. The RME]1 gene and the pLB12 
plasmid were cut with BglII and Nofl (New England Biolabs) and ligated and 
transformed as described above. 

Orthologous gene set mapping. In this study we used the orthologous gene sets 
defined previously”. 

Gene expression arrays. Arrays were designed with OligoArray (v2.1.3). The 
reference sequence used was downloaded from the NCBI genome project for 
Kluyveromyces lactis NRRL Y-1140, records NC006037 to NC006042. The predicted 
messenger RNA sequences were used. The sequences of two shorter open reading 
frame (ORF)-coding transcripts, MFA1 and AGA2, that were not annotated in 
NCBI at the time were added to the reference. The following parameters were used: 
maximum number of oligonucleotides to design per input sequence, 3; size range, 60 
to 60; maximum distance between the 5’ end of the oligonucleotide and the 3’ end of 
the input sequence, 1,500; minimum distance between the 5’ ends of two adjacent 
oligonucleotides, 69; T,,, range, 75 to 97 °C; GC range, 15.0 to 65.0; threshold to reject 
secondary structures, 65.0; threshold to start to consider cross-hybridizations, 65; 
sequence to avoid in the oligonucleotide: GGGGG;CCCCC;TTTTT;AAAAA. 

The arrays were printed by Agilent, using the 4 x 44K format. 

K. lactis strains were grown in either rich medium (YEPD) to an attenuance 
(Deoo) of 0.9 or in phosphate starvation medium, with or with out «-mating 
pheromone, as described previously”*. The 50-ml cultures were centrifuged for 5 
min at 3,700g, and the pellet was resuspended in 10 ml of 1 X TE and centrifuged 
again. The supernatant was removed and pellets were frozen in liquid nitrogen and 
stored at —80 °C. 

RNA was isolated and reverse transcribed as described previously** with the 
exception that the RNA isolation protocol was scaled to 50-ml cultures and that 
SuperScript II (Invitrogen) was used. Additionally, reverse transcription reactions 
were performed on either individual samples or on RNA in which an equivalent 
amount of each RNA sample was pooled. 

For the individual samples, 1.3 ug of CDNA was dried and resuspended in 5 Ll of 
0.1 M sodium bicarbonate. For the pooled samples, 5.9 ig of CDNA was dried and 
resuspended in 22.5 il of 0.1 M sodium bicarbonate. An equivalent volume of Cy3 
(pool) or Cy5 (individual) dye (Amersham) was added (dyes were resuspended in 
60 pl of DMSO) and the reaction was incubated in the dark at 65 °C for 20 min. 
Labelled cDNAs were purified with a Clean and Concentrator -5 kit (Zymo 
Research). 

Equal amounts of the Cy3-labelled and Cy5-labelled cDNA were hybridized 
overnight to the array, as described in the Agilent protocol. After hybridization, the 
arrays were washed as specified by Agilent. Arrays were scanned at 5 1m, aver- 
aging two lines, with an Axon GenePix 4000A scanner. Arrays were gridded with 
GenePix Pro v5.1. Global Lowess normalization analysis was performed for each 
array with a Goulphar script” (R Foundation for Statistical Computing). 
Normalized data were collapsed first by averaging the result for all duplicate 
probes and finally by taking the median of the probes for each ORF. Data were 
transformed as described for each experiment. Microarray data were clustered 
with Cluster version 3.0 (ref. 41) and visualized with Java TreeView v1.1.3 (ref. 42). 
ChIP-chip. K. lactis was grown in either YEPD to D¢oo = 0.4 or in phosphate 
starvation medium as described previously*’. The ChIP, DNA amplification, label- 
ling and hybridization were performed as described previously”. For the al and «2 
ChIPs, 2 ul of 5 mg ml! mouse anti-HA antibody clone 12CA5 (Roche) were 
used. The K. lactis tiling arrays used have been described previously”. Peak finding 
was performed with the ‘Extract peaks from Data Set(s)’ utility of MochiView™. 
Peak extraction was applied independently to each normalized ChIP-chip data 
item. Peak finding significance thresholds were kept at their default values (P = 
0.001). For the al and «2 ChIPs, regions of overlap between the two ChIP-chips 
were determined and the overlapping chromosomal coordinates were extracted, 
yielding 22 regions. This set was filtered to remove peaks in the telomeres or those 
that fell entirely in an ORF. MEME was performed on the remaining 14 peaks. 
Pheromone response assay. For K. lactis (SAY572 and yLB21a), the pheromone 
response assay was performed as described previously”, with the exception that the 
medium was not supplemented with additional leucine. For C. albicans (AH136al 
and yLB33a1), cells were grown in SCD medium at room temperature (23-25 °C) to 
Deo9 = 1.0 and exposed to pheromone (or mock 10% DMSO treatment) for 4 h as 
described in ref. 43. For S. cerevisiae (BY 4674 and YM1768), the cells were grown in 
YEPD at 30 °C to Dgoo = 0.5. 4-Mating pheromone (5 1M; Sigma-Aldrich) was 
added and the cells were grown at 30 °C for 90 min; the presence of mating 
projections was monitored by microscopy. 

Quantitative mating. The strains and selective medium used for these experi- 
ments are listed in Supplementary Information. 

Yeasts were grown in YEPD (S. cerevisiae, K. lactis) or SCD (C. albicans) at 30 °C 
(S. cerevisiae, K. lactis) or room temperature (C. albicans) to Dgo9 = 0.8. a cells 
(2 X 10°; about 200 tll of Dgoo = 0.8) were mixed with 10’ o cells (about 1 ml of 
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D600 = 0.8) in 10 ml of YEPD. The mixtures were filtered onto nitrocellulose 
(0.8 jtm pore size; Millipore) with a Millipore 1225 Vacuum Manifold. Filter discs 
were placed onto either YEPD + 55 jig ml’ adenine 2% agar plates (S. cerevisiae, 
C. albicans) or 2% malt extract, 3% agar plates (K. lactis). S. cerevisiae strains were 
allowed to mate for 5 h at 30 °C, K. lactis strains for 2 days at 30 °C, and C. albicans 
strains for 6 days at room temperature. The filter discs were then removed and 
placed in 5 ml of SD medium and the cells were dispersed by vortex-mixing. The 
cell suspensions were sonicated with a Branson Sonifier 450 at 30% power for 10s. 
Between 1/25 and 2.5 X 10 ° of the cell suspension was plated onto selective 
medium (see Supplementary Information). S. cerevisiae and K. lactis cells were 
grown at 30 °C for 2 days, and C. albicans cells were grown at room temperature 
for 4 days. Mating efficiency was calculated with the limiting parental a cells: 
efficiency = (a/a)/(a + a/a). 

RT-quantitative PCR. K. lactis strains were grown in YEP-galactose to Dgoo = 0.8 
and centrifuged at 3,700g for 5 min. The pellets were washed with 1 ml of 1 x TE 
and centrifuged at 3,700g for 5 min; supernatant was removed and frozen in liquid 
nitrogen. RNA was isolated and reverse transcribed (using SuperScript II) as 
described previously”*, with all volumes scaled appropriately. cDNAs were quan- 
tified with a Bio-Rad CFX96 Real Time machine ina standard 25-1] reaction using 
Sybr green under standard conditions. The primers used are listed in Supplemen- 
tary Information. 

Hypergeometric test. The significance of the set of genes (n = 20) down twofold 
or greater relative to wild type in a phosphate-starved Rme1 knockout containing 
all four core hsgs was calculated with a hypergeometric test. The background set of 
genes (n = 4,769) are defined as all genes that were detectable by our gene 
expression array. 

GO term analysis. For the analysis of the genes that were upregulated in the absence 
of RME1 in K. lactis, we performed GO term analysis using the S. cerevisiae GO term 
finder on SGD (http://www. yeastgenome.org/cgi-bin/GO/goTermFinder.pl). Genes 
that were up fourfold or greater in an RME1 knockout versus wild type in the RME1 
array and that had an orthologue in S. cerevisiae were compared with a background 
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set of genes defined as all S. cerevisiae genes orthologous to K. lactis genes that could 
be detected in our gene expression array. 
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Noise correlations improve response fidelity and 


stimulus encoding 
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Computation in the nervous system often relies on the integration 
of signals from parallel circuits with different functional properties. 
Correlated noise in these inputs can, in principle, have diverse and 
dramatic effects on the reliability of the resulting computations’ *. 
Such theoretical predictions have rarely been tested experimentally 
because of a scarcity of preparations that permit measurement of 
both the covariation of a neuron’s input signals and the effect on a 
cell’s output of manipulating such covariation. Here we introduce a 
method to measure covariation of the excitatory and inhibitory 
inputs a cell receives. This method revealed strong correlated noise 
in the inputs to two types of retinal ganglion cell. Eliminating cor- 
related noise without changing other input properties substantially 
decreased the accuracy with which a cell’s spike outputs encoded 
light inputs. Thus, covariation of excitatory and inhibitory inputs 
can be a critical determinant of the reliability of neural coding and 
computation. 

Differences in the properties of excitatory and inhibitory synaptic 
inputs to a target cell provide a key control of neural activity. Feed- 
forward inhibitory synaptic input is a ubiquitous example. A delay in 
inhibitory input relative to excitatory input, for example by an extra 
synaptic delay in the circuit providing inhibitory input, can limit res- 
ponse duration to the time window in which the target cell receives 
excitatory but not inhibitory input’. More generally, inhibitory input 
can cancel unwanted responses by arriving before or at the same time 
as excitatory input’®'’. Theoretical work illustrates how the effective- 
ness of these computations depends on the strength of covariation 
between excitatory and inhibitory synaptic inputs*. Thus, although 
synaptic noise will always decrease the reliability of the neural res- 
ponse, strong noise correlations, unlike independent noise, could allow 
fluctuations in inhibitory synaptic input to cancel corresponding fluc- 
tuations in excitatory synaptic input? (Fig. 1). Such noise correlations 
can arise if noise within excitatory and inhibitory pathways originates 
from acommon source (Fig. 1, left), for example in densely and randomly 
connected recurrent networks". Noise cancellation in synaptic integ- 
ration could in turn reduce trial-to-trial variability in a cell’s spike output 
(Fig. 1, right). 

The extent and impact of noise correlations depends on several 
network and cellular properties, including nonlinearities in synaptic 
transmission’ or spike generation’® that could decrease correlation 
strength. This dependence makes it difficult to predict the importance 
of noise correlations from modelling alone or from correlations mea- 
sured in cell pairs. Work on the retina provides a rare opportunity to 
provide quantitative experimental information about how noise cor- 
relations affect the coding of physiologically relevant stimuli. Our goal 
was first to measure covariation of the excitatory and inhibitory syn- 
aptic inputs received by a retinal ganglion cell (Fig. 1, (Q1)) and then to 
test how these noise correlations affect the encoding of light stimuli ina 
cell’s spike output (Fig. 1, (Q2)). 

Quantifying the covariation of excitatory and inhibitory synaptic 
input requires measuring these two conductances simultaneously or 
near simultaneously. To do this, we rapidly alternated the ganglion cell 
voltage between the reversal potentials for excitatory and inhibitory 


synaptic inputs, collecting a single sample of each input every 10 ms 
(Fig. 2a). Control experiments indicated that the voltage at the synaptic 
receptors had reached a near-constant value at these sampling times 
(Supplementary Fig. 1). This sampling rate is high in comparison with 
the 50-100 ms time course of a ganglion cell’s response to light inputs. 
To check how well this procedure captured light-dependent changes in 
conductance, we compared the simultaneously measured conduc- 
tances with those measured non-simultaneously when the voltage 
was held constant at the excitatory or inhibitory reversal potential. 
Mean excitatory and inhibitory conductances resulting from a 
repeated, modulated light input differed minimally (Fig. 2b). In 21 
cells, the alternating voltage approach captured 99.9 + 0.6% of the 
power of the conductance signal and 83 + 4% of that of the conduc- 
tance noise (mean + s.e.m.; see Methods). Thus, simultaneous con- 
ductance measurements capture most of the structure in the synaptic 
inputs a ganglion cell receives. 

Simultaneous conductances measured during constant light input 
often exhibited spontaneous excitatory synaptic events accompanied in 
time by inhibitory synaptic events (Fig. 2c, top, black arrowheads). Such 
events in fact typically occurred together. Correlated noise events were 
rarely observed during non-simultaneously measured conductances 
(Fig. 2c, bottom). Correspondingly, the cross-correlation function for 
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Figure 1 | Effects of noise correlations on the variability of synaptic current 
and spike output. Neural encoding consists of three basic steps: a stimulus 
shapes excitatory (blue, G,,) and inhibitory (red, Gi,,) synaptic conductances; 
these conductances then shape synaptic currents; and the resulting currents 
control spike generation to produce a sequence of action potentials (spikes). 
Noise correlations will be strong if a common source dominates noise in 
excitatory and inhibitory pathways (Noise.om) and minimal if the dominant 
noise source arises independently (Noise;,4). Correlated (black traces) as 
opposed to uncorrelated (green traces) noise between excitatory and inhibitory 
conductances can lead to lower variability of both the synaptic current and the 
spike output (shaded regions around traces). Understanding this issue requires 
answering two questions. (Q1) How much do converging excitatory and 
inhibitory input covary? (Q2) What is the impact of such noise correlations on 
the neural output? 
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simultaneously measured excitatory and inhibitory conductances during 
constant light input showed considerable structure, unlike the cross- 
correlation for non-simultaneously measured conductances (single cell: 
Fig. 2d, top; population: Fig. 2d, bottom). Thus, simultaneous conduc- 
tance recordings revealed correlations between converging synaptic 
inputs that were inaccessible from more conventional recordings. 

To determine both the strength of noise correlations during modu- 
lated light input and their effect on a cell’s spike output, we first con- 
sidered midget ganglion cells, which comprise the majority of ganglion 
cells in the primate retina’’. Midget ganglion cells receive delayed feed- 
forward synaptic inhibition, where the delay reflects an extra synapse 
in the circuit controlling inhibitory input. Thus, excitatory input comes 
directly from bipolar cells, whereas inhibitory input comes from ama- 
crine cells that themselves receive input from bipolar cells'*. Similar 
delayed feed-forward inhibition is a characteristic of many cortical 
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Figure 2 | Near-simultaneous recording of excitatory and inhibitory 
synaptic input to an ON-OFF directionally selective ganglion cell. a, Light 
stimulus (S) is presented while the voltage (V) of the cell alternates between the 
excitatory (E.x-) and inhibitory (E;,,) reversal potentials. Excitatory (blue) and 
inhibitory (red) synaptic currents (I) are sampled at the end of each voltage step. 
b, Conductances derived from measured currents (Methods) and averaged 
across multiple repeats of the same stimulus (S). Simultaneously measured 
conductances (solid lines) closely match those (dashed lines) measured non- 
simultaneously with the voltage held fixed at the reversal potentials for 
excitatory or inhibitory input (both excitatory and inhibitory correlations are 
0.91 + 0.01 (mean + s.e.m.), 21 cells). a is an enlarged view of the boxed region 
of b. c, Top: section of simultaneously recorded conductances during constant 
light input shows correlated excitatory and inhibitory spontaneous events 
(black arrowheads). Bottom: non-simultaneously recorded conductances also 
show spontaneous events (green arrowheads), but they are rarely correlated. 
Records have been resampled at 50 Hz for comparison with the top 
conductances. d, Top: cross-correlation (mean + s.e.m., 10 trials) of excitatory 
and inhibitory conductances in an example cell during simultaneous (black) 
and non-simultaneous (green) recording. Bottom, cross-correlation for all 
recorded cells (mean + s.e.m., 6 cells). 


circuits, including hippocampus, cerebellum, barrel cortex and auditory 
cortex®"*!*?°1, We simultaneously recorded excitatory and inhibitory 
synaptic inputs during a full-field modulated light stimulus (Fig. 3a, left) 
and estimated variability in the synaptic responses by subtracting the 
average synaptic input from each individual trial (Fig. 3a, right). The 
peak correlation strength of the resulting residuals ranged from 0.15 to 
0.5 (Fig. 3b, black traces). Noise correlations in the interleaved non- 
simultaneous conductances were substantially smaller (Fig. 3b, green 
traces). Slow drift in the light response accounted for the remaining 
noise correlations in the non-simultaneous conductances (Supplemen- 
tary Fig. 2). 

The alternating-voltage technique could produce artefactual noise 
correlations by overshooting the appropriate reversal potentials for 
excitatory or inhibitory synaptic inputs. For example, holding at a 
voltage positive relative to the excitatory reversal potential could cause 
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Figure 3 | Strength and impact of noise correlations in synaptic inputs to 
primate midget ganglion cells. a, Left: two trials of simultaneously recorded 
conductances during modulated light input (grey). Right: residual 
conductances (trials from left with mean subtracted), which estimate noise in 
each trial. b, Left: cross-correlation (mean + s.e.m., 12 trials) of excitatory and 
inhibitory residual conductances in an example cell during simultaneous 
(black) and non-simultaneous (green) recording. Right: cross-correlation for 
all recorded cells (mean + s.e.m., 15 cells). c, Logic of dynamic-clamp 


experiments using simultaneously or shuffled simultaneous conductances in 
place of synaptic input. d, Example spike trains from 12 dynamic-clamp trials of 
simultaneous conductances (black) or their shuffled counterparts (green). SNR, 
signal-to-noise ratio. e, Signal-to-noise ratio of spike trains generated from 
simultaneous conductances versus that of spike trains generated from shuffled 
conductances (dots). The signal-to-noise ratio for simultaneous conductances 
was 1.22 + 0.04 times higher than that for shuffled conductances 

(mean + s.e.m., 7 cells, P= 0.0015). 


16 DECEMBER 2010 | VOL 468 | NATURE | 965 


©2010 Macmillan Publishers Limited. All rights reserved 


LETTER 


an increase in the excitatory conductance to be misinterpreted as an 
increase in both the excitatory and inhibitory conductances, thus lead- 
ing to an artefactual correlation. A similar logic holds if a cell is held 
more negative than the reversal potential for inhibitory input. 
However, if anything the alternating-voltage technique fell short of 
the actual reversal potentials and hence underestimated the strength 
of noise correlations (Supplementary Fig. 3). 

To determine the effect of covariation of excitatory and inhibitory 
synaptic inputs on a midget ganglion cell’s response to physiological 
inputs, we compared the pattern of spikes produced by simultaneous 
(with noise correlations) and non-simultaneous (without noise corre- 
lations) conductances in dynamic-clamp experiments (Fig. 3c and 
Supplementary Fig. 4). The non-simultaneous conductances consisted 
of shuffled pairings of simultaneously recorded excitatory and inhibi- 
tory conductances; this procedure removed noise correlations while 
holding all other statistics constant. We compared the precision of the 
spike responses to the two sets of conductances by calculating the 
signal-to-noise ratio from repeated dynamic-clamp trials (Fig. 3d; 
see Methods). In all cases, the signal-to-noise ratio was higher for 
conductances with noise correlations (Fig. 3e). Quantifying the tem- 
poral precision of the spike responses using a spike distance metric**”* 
gave similar results (data not shown). Thus, the precision of a midget 
cell’s output in response to light stimuli depends on the covariation of 
excitatory and inhibitory synaptic inputs. 

Feed-forward synaptic inhibition can serve a more diverse func- 
tional role when the amplitude or timing of inhibitory input relative 
to excitatory input depends on the stimulus. For example, the ability of 
a subset of retinal ganglion cells to respond to the direction of a moving 
object**** (Fig. 4a, b) relies on cancellation of excitatory input by inhibi- 
tory input in the non-preferred direction'®. Covariation of excitatory 
and inhibitory synaptic inputs could make such a mechanism robust to 
noise, for example by preventing a larger-than-average excitatory syn- 
aptic event from overwhelming the corresponding inhibitory synaptic 
event and causing a response to movement in an inappropriate dir- 
ection. To test this proposal, we recorded simultaneous conductances 
in mouse ON-OFF directionally selective ganglion cells (ON-OFF 
DSGCs) in response to a bar of light moving in different directions 
(Fig. 4a). Excitatory and inhibitory conductances showed strong noise 
correlations that were largely absent from non-simultaneous conduc- 
tances (Fig. 4d; see Supplementary Fig. 5 for results from full-field light 
stimuli). Both excitatory and inhibitory conductances and the strength 
of the noise correlations depended on bar direction (Fig. 4c, d). For 
example, noise correlations in the non-preferred direction were three to 
four times stronger than those in the preferred direction. Furthermore, 
excitatory and inhibitory conductances showed near-perfect covaria- 
tion in the non-preferred direction. 

We tested the impact of noise correlations on direction tuning using 
simultaneous (with noise correlations) and non-simultaneous (with- 
out noise correlations) conductances in dynamic-clamp experiments; 
non-simultaneous conductances consisted of simultaneous conduc- 
tances shuffled between trials but not bar directions. Both the mean 
and the standard deviation of the firing rate in the non-preferred 
direction were considerably higher for non-simultaneous conduc- 
tances (Fig. 4e, f). The failure of a cell to attenuate its response reliably 
for movement in the non-preferred direction should negatively affect 
its ability to encode direction. Indeed, each recorded cell showed 
greater direction selectivity for the simultaneous conductances 
(Fig. 4g). Thus, the computation underlying directional selectivity 
depends on covariation of excitatory and inhibitory synaptic inputs 
and the resulting cancellation of noise shared between the circuits 
providing each type of input. 

Computation in the retina follows a basic plan found in many other 
neural circuits: signals in a common population of inputs diverge to 
parallel and functionally dissimilar pathways, and integration of the 
signals from multiple parallel pathways governs the output of the circuit. 
Divergence into separate excitatory and inhibitory circuits is a prominent 
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Figure 4 | Strength and impact of noise correlations in synaptic inputs to 
ON-OFF directionally selective ganglion cells. a, A bar of light was moved in 
eight directions, at 45° increments in random order. b, Extracellular (cell- 
attached configuration) spike responses to the moving bar. c, Examples of 
simultaneously recorded conductances showing tuning of excitatory (blue) and 
inhibitory (red) conductances. d, Simultaneous conductances (black) show 
strong noise correlations that are largely absent from the non-simultaneous 
conductances (green). e, Normalized directional tuning (spike count versus 
direction) from a single dynamic-clamp experiment (mean + s.d.) for 20 trials 
of simultaneous or shuffled simultaneous conductances. Insets at 45° (preferred 
direction) and 225° (non-preferred direction) show spike rasters. f, Standard 
deviation of the normalized spike count is significantly smaller for 
simultaneous trials than for shuffled trials in non-preferred directions (135- 
315°; P<0.05, 10 cells). Standard deviations in the preferred direction were 
similar. g, Direction selectivity index (DSI; see Methods) is 2.0 + 0.2 times 
larger for simultaneous conductances than for shuffled conductances 

(mean = s.e.m., 10 cells, P = 0.0002). 


example of such a motif. Noise in shared inputs naturally causes covari- 
ation of signals in the parallel pathways. The strength of such noise 
correlations will depend on cellular properties within the network’*’°, 
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the stimulus delivered” (Fig. 4) and the state of the network’’. Thus, 
excitatory and inhibitory inputs to cells in some, but not all, circuits are 
expected to show strong noise correlations, as indeed is the case in barrel 
cortex’’’*, Here we put such noise correlations in the context of the 
coding of physiologically relevant stimuli. Our results reveal a critical 
role for noise correlations in maintaining appropriate cancellation of 
excitatory and inhibitory inputs and thus sharpening tuning to specific 
stimuli. This work provides an example of neurons that perform com- 
putations reliant on noise correlations. Given the prevalence of circuits in 
which feed-forward inhibition shapes neural responses”"””*”?*1, noise 
correlations probably have a similar role in other neural circuits. 


METHODS SUMMARY 


We took electrical recordings from midget ganglion cells in primate and ON-OFF 
DSGCs in mouse retinas using patch-clamp techniques as previously 
described**”’. Light stimuli were delivered from light-emitting diodes or an 
organic light-emitting diode monitor (eMagin). Mean light levels for all experi- 
ments were near 5,000 absorbed photons per cone per second. 

The 10-ms cycle period during the simultaneous conductance recordings allows 
us to resolve input at 50 Hz and below. The fraction of the measured current 
variance at this cycle time was determined by calculating the fraction of the 
variance of the non-simultaneous (constant-voltage) conductances that can be 
accounted for by the variance of the simultaneous conductances. 

Signal-to-noise ratios of spike outputs were calculated by forming spike trains of 
zeroes and ones from each trial, with 1-ms resolution. The mean and trial residuals 
of these spike trains were calculated and the power spectra of these functions were 
assessed and corrected for sample number bias*’. Power spectra were integrated 
between 1 and 20 Hz and the result for the mean responses was divided by that for 
the residuals (Supplementary Fig. 6). 

Spike number in ON-OFF DSGCs in response to the moving bar was summed 
over the entire duration of the bar’s movement. The direction selectivity index'® 
was calculated as DSI = |}“v,/}°r|, where v; are vectors of lengths r;, equal to the 
normalized firing rate, and point in the direction of the moving bar that produced 
the presented conductances. 

Current injected into a cell (I) during dynamic-clamp experiments” was calcu- 
lated as 


I(t) = Gexc(t)(V(t _ At) = Eexc) 
+ Ginn(t)(V(t— At) — Einh) 


where Gexc and Ginn are a pair of conductances recorded during light stimulation, 
V is the cell’s membrane potential, and E... and E;,, are reversal potentials set 
respectively at 0 mV and —80 mV. Changing the inhibitory reversal potential, Einn, 
to —50 mV did not substantially affect the results. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Electrical recordings were made from midget ganglion cells in primate and ON- 
OFF DSGCs in mouse retinas as previously described”. Midget ganglion cells 
were identified by their relatively sustained response to light steps and characteristic 
morphology'’”***. ON-OFF DSGCs were identified by a combination of at least 
two of the following criteria: an on-off light response to a brief light step, a 
bistratified morphology and a directionally selective spike response. 

Light stimuli were delivered from light-emitting diodes or an organic light-emitting 
diode monitor (eMagin). Mean light levels for all experiments were near 5000 
absorbed photons per cone per second. Full-field stimuli consisted of 10 s of constant 
light followed by 10s of 50%-contrast modulated light (low-pass-filtered at 60 Hz) 
repeated for 5-20 trials. Moving bars were 180 tm wide, 720 jim long, moved at 
864 ums" along the long axis and had a contrast of between 100 and 150%. 

For all recordings a flat-mounted piece of retina was superfused with warmed 
(31-34 °C) and oxygenated (5% CO , 95% O ) Ames solution. Midget cell 
dynamic-clamp experiments were performed with receptors mediating excitatory 
and inhibitory synaptic input blocked (10 14M NBQX, 11M stychnine, 10 1M 
gabazine). Pipettes for voltage-clamp recordings were filled with a Cs-based 
internal solution (105 mM CsCH3SO3, 10mM TEA-Cl, 20mM HEPES, 10 mM 
EGTA, 5mM Mg-ATP, 0.5mM_ Tris-GTP and 2mM QX-314, pH ~7.3, 
~280mosM). Pipettes for dynamic-clamp experiments were filled with a 
K-based internal solution (110mM K aspartate, 1 mM MgCl, 10mM HEPES, 
5mM NMDG, 0.5mM CaCh, 10mM phosphocreatine, 4mM Mg-ATP and 
0.5mM Tris-GTP, pH ~7.2, ~280mosM). Liquid junction potentials were 
~10mV and were not compensated throughout the text. Low access resistance 
was critical, and only cells with access resistance below 20 MQ were included for 
analysis. Access resistance was partially compensated for (75% for experiments 
using an Axopatch 200B amplifier; 50% compensation and prediction for experi- 
ments using a Multiclamp 700B amplifier). Conductances were derived from 
excitatory and inhibitory synaptic currents by dividing the currents by assumed 
driving forces corresponding to voltages of —62 and +62 mV, respectively. 

Both ganglion cell types showed evidence for NMDA-receptor-mediated con- 
ductances (J-shaped J-V plots that became linear in the presence of 10 1M APV). 
The presence of an NMDA conductance could cause noise correlations to be 
substantially underestimated if the voltage is substantially below the excitatory 
reversal potential. However, we observed only a weak impact of this conductance 


when noise correlations were compared before and after application of APV. 
Results from two cells recorded only in the presence of APV were included in 
the full data set. 

The 10-ms cycle period during the simultaneous conductance recordings allows 
us to resolve input at 50 Hz and below. The fraction of the measured current 
variance at this cycle time was determined by calculating the fraction of the 
variance of the non-simultaneous (constant-voltage) conductances that can be 
accounted for by the variance of the simultaneous conductances. 

Signal-to-noise ratios of spike outputs were calculated by forming spike trains of 
zeroes and ones from each trial, with 1-ms resolution. The mean and trial residuals 
of these spike trains were calculated and the power spectra of these functions were 
assessed and corrected for sample number bias**. Power spectra were integrated 
between 1 and 20Hz and the result for the mean responses was divided by that for 
the residuals (Supplementary Fig. 6). 

Spike number in ON-OFF DSGCs in response to the moving bar was summed 
over the entire duration of the bar’s movement. The direction selectivity index’° 
was calculated as DSI = |}“v;/}>r;|, where v; are vectors of lengths r;, equal to the 
normalized firing rate, and point in the direction of the moving bar that produced 
the presented conductances. 

Current injected into a cell (I) during dynamic-clamp experiments” was calcu- 
lated as 


I(t) = Gexe(t)(V(t— At) — Exc) 
ar Ginn(t)(V(t _ At) = Einn) 


where G,,. and Gi, are a pair of conductances recorded during light stimulation, 
V is the cell’s membrane potential, and E... and Ein are reversal potentials set 
respectively at 0 mV and —80 mV. Changing the inhibitory reversal potential, E;,, 
to —50 mV did not substantially affect the results. 

Correlations were calculated using the ‘xcov’ function in MATLAB, release 
2009a (MathWorks) and normalized using the ‘coef option. Briefly, this function 
calculates the cross-correlation after subtracting the means from each trial and 
normalizes by the geometric mean of the autocorrelation (see Supplementary 
Information, equation (2.1)). 


32. Polyak, S. & Willmer, E. N. Retinal structure and colour vision. Doc. Ophthalmol. 3, 
24-56 (1949). 
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COT drives resistance to RAF inhibition through 
MAP kinase pathway reactivation 
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Oncogenic mutations in the serine/threonine kinase B-RAF (also 
known as BRAF) are found in 50-70% of malignant melanomas’. 
Pre-clinical studies have demonstrated that the B-RAF(V600E) 
mutation predicts a dependency on the mitogen-activated protein 
kinase (MAPK) signalling cascade in melanoma” *—an observation 
that has been validated by the success of RAF and MEK inhibitors in 
clinical trials’. However, clinical responses to targeted anticancer 
therapeutics are frequently confounded by de novo or acquired 
resistance'®”. Identification of resistance mechanisms in a manner 
that elucidates alternative ‘druggable’ targets may inform effective 
long-term treatment strategies'®. Here we expressed ~600 kinase 
and kinase-related open reading frames (ORFs) in parallel to inter- 
rogate resistance to a selective RAF kinase inhibitor. We identified 
MAP3K8 (the gene encoding COT/Tpl2) as a MAPK pathway ago- 
nist that drives resistance to RAF inhibition in B-RAF(V600E) cell 
lines. COT activates ERK primarily through MEK-dependent 
mechanisms that do not require RAF signalling. Moreover, COT 
expression is associated with de novo resistance in B-RAF(V600E) 
cultured cell lines and acquired resistance in melanoma cells and 
tissue obtained from relapsing patients following treatment with 
MEK or RAF inhibitors. We further identify combinatorial MAPK 
pathway inhibition or targeting of COT kinase activity as possible 
therapeutic strategies for reducing MAPK pathway activation in 
this setting. Together, these results provide new insights into res- 
istance mechanisms involving the MAPK pathway and articulate an 
integrative approach through which high-throughput functional 
screens may inform the development of novel therapeutic strategies. 

To identify kinases capable of circumventing RAF inhibition, we 
assembled and stably expressed 597 sequence-validated kinase ORF 
clones representing ~75% of annotated kinases (Center for Cancer 
Systems Biology (CCSB)/Broad Institute Kinase ORF Collection) in 
A375, a B-RAF(V600E) malignant melanoma cell line that is sensitive 
to the RAF kinase inhibitor PLX4720™ (Fig. 1a, b, Supplementary 
Table 1 and Supplementary Fig. 2). ORF-expressing cells treated with 
1M PLX4720 were screened for viability relative to untreated cells 
and normalized to an assay-specific positive control, MEK1(S218/ 
222D) (MEK1?P)!5 (Supplementary Table 2 and summarized in Sup- 
plementary Fig. 1). Nine ORFs conferred resistance at levels exceeding 
two standard deviations from the mean (Fig. 1b and Supplementary 


Table 2) and were selected for follow-up analysis (Supplementary Fig. 3). 
Three of the nine candidate ORFs were receptor tyrosine kinases, under- 
scoring the potential of this class of kinases to engage resistance 
pathways. Resistance effects were validated and prioritized across a 
multi-point PLX4720 drug concentration scale in the B-RAF(V600E) 
cell lines A375 and SKMEL28. The Ser/Thr MAP kinase kinase kinases 
(MAP3Ks) MAP3K8 (COT/Tpl2) and RAFI (C-RAF) emerged as top 
candidates from both cell lines; these ORFs shifted the PLX4720 half- 
maximal growth inhibitory concentration (GI;9) by 10-600-fold with- 
out affecting viability (Supplementary Table 3 and Supplementary Figs 4 
and 5). Both COT and C-RAF reduced sensitivity to PLX4720 in mul- 
tiple B-RAF(V600E) cell lines (Fig. 1c) confirming the ability of these 
kinases to mediate resistance to RAF inhibition. 

Next, we tested whether overexpression of these genes was sufficient 
to activate the MAPK pathway. At baseline, COT expression increased 
ERK phosphorylation in a manner comparable to MEK1””, consistent 
with MAP kinase pathway activation (Fig. 2a and Supplementary Fig. 6). 
Overexpression of wild-type COT or C-RAF resulted in constitutive 
phosphorylation of ERK and MEK in the presence of PLX4720, whereas 
kinase-dead derivatives had no effect (Fig. 2a and Supplementary Fig. 7). 
Based on these results, we proposed that COT and C-RAF drive resist- 
ance to RAF inhibition predominantly through re-activation of MAPK 
signalling. Notably, of the nine candidate ORFs from our initial screen, a 
subset (three) did not show persistent ERK/MEK phosphorylation fol- 
lowing RAF inhibition, suggesting MAPK pathway-independent altera- 
tion of drug sensitivity (Supplementary Fig. 8). 

Several groups have shown that C-RAF activation and hetero- 
dimerization with B-RAF constitute critical components of the cellular 
response to B-RAF inhibition’*’. In A375 cells, endogenous C-RAF- 
B-RAF heterodimers were measurable and inducible following treatment 
with PLX4720 (Supplementary Fig. 9). However, endogenous C-RAF 
phosphorylation at S338—an event required for C-RAF activation— 
remained low (Supplementary Fig. 9). In contrast, ectopically expressed 
C-RAF was phosphorylated on $338 (Supplementary Fig. 9) and its 
PLX4720 resistance phenotype was associated with sustained MEK/ 
ERK activation (Fig. 2a, Supplementary Fig. 9). Moreover, ectopic ex- 
pression of a high-activity C-RAF truncation mutant (C-RAF(W22)) 
was more effective than wild-type C-RAF in mediating PLX4720 resist- 
ance and ERK activation (Supplementary Fig. 10), further indicating that 
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Figure 1 | An ORF-based functional screen identifies COT and C-RAF 
kinases as drivers of resistance to B-RAF inhibition a, Overview of the CCSB/ 
Broad Institute Kinase ORF collection. Kinase classification and number of 
kinases per classification are noted. b, A375 cells expressing the CCSB/Broad 
Institute Kinase ORF collection were assayed for relative viability in 1M 
PLX4720 and normalized to constitutively active MEK1 (MEK1?”). Nine 
ORFs (orange disks) scored 2 standard deviations (red dashed line, 58.64%) 
from the mean of all ORFs (green dashed line, 44.26%). c, Indicated ORFs were 
expressed in five B-RAF(V600E) cell lines and treated with DMSO or 1 1M 
PLX4720. Viability (relative to DMSO) was quantified after 4 days. Error bars 
represent standard deviation between replicates (n = 6). 


elevated C-RAF activity may direct resistance to this agent. Consis- 
tent with this model, oncogenic alleles of NRAS and KRAS conferred 
PLX4720 resistance in A375 cells (Fig. 2b) and yielded sustained 
C-RAF(S338) and ERK phosphorylation in the context of drug treat- 
ment (Fig. 2c). Thus, although genetic alterations that engender 
C-RAF activation (for example, oncogenic RAS mutations) tend to show 
mutual exclusivity with B-RAF(V600E) mutation, such co-occurring 
events”?! might be favoured in the context of acquired resistance to 
B-RAF inhibition. 

To investigate the role of COT in melanoma, we first determined its 
expression in human melanocytes. We found that primary immortalized 
melanocytes (B-RAF wild-type) expressed COT (Fig. 2d), although 
ectopic B-RAF(V600E) expression reduced MAP3K8 mRNA levels (Sup- 
plementary Fig. 11) and rendered COT protein undetectable (Fig. 2d). 
Conversely, whereas ectopically expressed COT was only weakly detect- 
able in A375 cells (Fig. 2a, e), short hairpin RNA (shRNA)-mediated 
depletion of endogenous B-RAF(V600E) caused an increase in COT 
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protein levels that correlated with the extent of B-RAF knockdown 
(Fig. 2e). Moreover, treatment of COT-expressing A375 cells with 
PLX4720 led to a dose-dependent increase in COT protein (Fig. 2a) 
without affecting ectopic MAP3K8 mRNA levels (Supplementary 
Fig. 11). Thus, oncogenic B-RAF may antagonize COT expression 
largely through altered protein stability (Fig. 2a, d, e and Supplemen- 
tary Fig. 11), and B-RAF inhibition may potentiate the outgrowth of 
COT-expressing cells during the course of treatment. Notably, neither 
C-RAF nor B-RAF alone or in combination was required for ERK 
phosphorylation in the context of COT expression, even in the pres- 
ence of PLX4720 (Fig. 2e, fand Supplementary Fig. 12), suggesting that 
COT expression is sufficient to induce MAP kinase pathway activation 
in a RAF-independent manner. 

We predicted that cell lines expressing elevated COT in a 
B-RAF(V600E) background should show de novo resistance to 
PLX4720 treatment. To identify such instances, we screened a panel 
of cell lines for evidence of MAP3K8 copy number gains coincident with 
the B-RAF(V600E) mutation. Of 534 cell lines that had undergone copy 
number analysis and mutation profiling, 38 cell lines (7.1%) contained 
the B-RAF(V600E) mutation. Within this subgroup, two cell lines— 
OUMS-23 (colon cancer) and RPMI-7951 (melanoma)—also showed 
evidence of chromosomal copy gains spanning the MAP3K8 locus 
(Fig. 3a and Supplementary Fig. 13) and robust COT protein expression 
(Fig. 3b and Supplementary Fig. 14). We also screened a panel of mel- 
anoma short-term cultures for COT protein expression. Only one of 
these lines expressed COT: M307, a short-term culture derived from a 
B-RAF(V600E) tumour that developed resistance to allosteric MEK 
inhibition following initial disease stabilization’* (Fig. 3c). All three cell 
lines were refractory to PLX4720 treatment, with GI,o values in the 
range of 8-10 uM (Fig. 3d), and showed sustained ERK phosphoryla- 
tion in the context of B-RAF inhibition (Fig. 3e, f). OUMS-23 and 
RPMI-7951 are MAPK pathway inhibitor-naive cell lines, implying that 
COT may confer de novo resistance to RAF inhibition (a phenomenon 
observed in ~10% of B-RAF(V600E) melanomas’). 

Next, we examined COT expression in the context of resistance to 
the clinical RAF inhibitor PLX4032 by obtaining biopsy material from 
three patients with metastatic, B-RAF(V600E) melanoma. Each case 
consisted of frozen, lesion-matched biopsy material obtained before 
and during treatment (‘pre-treatment’ and ‘on-treatment’; Fig. 3g and 
Supplementary Table 4); additionally, one sample contained two inde- 
pendent biopsy specimens from the same relapsing tumour site (‘post- 
relapse’; Fig. 3g). Consistent with the experimental models presented 
above, quantitative real-time PCR with reverse transcription (qRT- 
PCR) analysis revealed increased MAP3K8 mRNA expression concur- 
rent with PLX4032 treatment in two of three cases. MAP3K8 mRNA 
levels were further increased in a relapsing specimen relative to its 
pre-treatment and on-treatment counterparts (Fig. 3g, Patient 1). An 
additional, unmatched relapsed malignant melanoma biopsy showed 
elevated MAP3K8 mRNA expression comparable to levels observed in 
RAF inhibitor-resistant, MAP3K8-amplified cell lines (Supplementary 
Fig. 15). This specimen also exhibited robust MAPK pathway activa- 
tion and elevated expression of B-RAF, C-RAF and COT relative to 
matched normal skin or B-RAF(V600E) cell lines (Supplementary Fig. 
15). Sequencing studies of this tumour revealed no additional muta- 
tions in B-RAF, NRAS or KRAS (data not shown). These analyses 
provided clinical evidence that COT-dependent mechanisms may be 
operant in at least some PLX4032-resistant malignant melanomas. 

To determine if COT might actively regulate MEK/ERK phosphor- 
ylation in B-RAF(V600E) cells that harbour naturally elevated COT 
expression, we introduced shRNA constructs targeting MAP3K8/COT 
into RPMI-7951 cells. Depletion of COT suppressed RPMI-7951 viab- 
ility (Supplementary Fig. 16) and decreased ERK phosphorylation 
(Fig. 3h), implying that targeting COT kinase activity might suppress 
MEK/ERK phosphorylation in cancer cells with COT overexpression 
or amplification. Treatment of RPMI-7951 cells with a small molecule 
COT kinase inhibitor” resulted in dose-dependent suppression of 


16 DECEMBER 2010 | VOL 468 | NATURE | 969 


©2010 Macmillan Publishers Limited. All rights reserved 


LETTER 


a GFP_MEK129 MAP3K8 C-RAF fp 
c g 3 8 2" c g IF 8 2 
PLX(UM): SG ES EE 


= MEK1 


4 MEK1™ #NRAS(Q61R) 


e@KRAS(G12V) 


ce MEK1 NRAS KRAS 
1 uM WT DD G12D Q61R G12V 


¥ NRAS(G12D) SE A ei A eto o 


Bias 
V5-C-RAF g 100 
5 75 
® 50 
£ 2% 
oe a * CO 
pal 24S UNG ee ae ae 
PLX4720 [log (uM)] A375 
A375 MEK1 MAP3K8 
d m e MEK1 MAP3K8 f Ss fs = a 
> c.f wu 
S 
& shana: § & g 3 Z z 
so PLX4720: — + —- + —- + -— #+- #¢+—- + 
NOS 4-467 
« pERK 
coT a. 
B-RAF oO“ C-RAF 
perk V5-COT 
ak a 
Ne a 
Primary 
melanocytes 


Figure 2 | Resistance to B-RAF inhibition via MAPK pathway activation 
a, Indicated ORFs were expressed in A375. Levels of phosphorylated MEK and 
ERK were assayed after 18h treatment with DMSO (—) or PLX4720 
(concentration noted). GFP, green fluorescent protein; V5-C-RAF, V5-COT, 
V5-GEP, V5 epitope-tagged C-RAF, COT and GFP, respectively; PLX, 
PLX4720. b, Proliferation of A375 expressing indicated ORFs. Error bars 
represent standard deviation between replicates (n = 6). c, C-RAF (S338) and 
ERK phosphorylation in lysates from A375 expressing indicated ORFs. VINC, 
vinculin; WT, wild type; pS338, C-RAF phosphorylated on Ser338. d, COT 
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Figure 3 | COT expression predicts resistance to B-RAF inhibition in cancer 
cell lines a, MAP3K8 copy numbers. Red bars, MAP3K8 amplification; blue 
bars, non-amplified COT. b, COT expression in B-RAF(V600E) cell lines and 
c, short-term cultures. d, PLX4720 Gls in B-RAF(V600E) cell lines. Colours as 
ina. e, MEK and ERK phosphorylation after treatment with DMSO or PLX4720 
(concentration indicated). f, ERK phosphorylation in M307 lysates (AZD-R; 
AZD6244-resistant) treated with DMSO or 1 uM PLX4720 (PLX) or CI-1040 
(CI). g, MAP3K8 mRNA expression (qRT-PCR) in patient/lesion-matched 
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expression in lysates from immortalized primary melanocytes expressing 
BRAF(V600E) or empty vector. MAP3K8 mRNA has an internal start codon 
(30™) resulting in two protein products of different lengths; amino acids 1-467 
or 30-467, noted with arrows. e, COT expression and ERK phosphorylation in 
lysates from A375 expressing indicated ORFs following shRNA-mediated 
B-RAF depletion (shBRAF) relative to control shRNA (shLuc). f, ERK 
phosphorylation in lysates from A375 expressing indicated ORFs following 
shRNA-mediated C-RAF depletion (shCRAF) or control shRNA (shLuc), after 
18h treatment with DMSO (—) or 1 1M PLX4720 (+). 
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PLX4032-treated metastatic melanoma tissue samples. Patients 1 and 3 had 
multiple biopsies from the same lesion. Error bars represent s.e.m. (n = 3). Pt, 
patient; U; undetermined/undetectable; TBP, TATA binding protein. h, ERK 
and MEK phosphorylation in RPMI-7951 following shRNA-mediated COT 
depletion (shCOT) versus control (shLuc) and treatment with DMSO (—) or 
1 uM PLX4720 (+). ERK and MEK phosphorylation are quantified. i, ERK and 
MEK phosphorylation in RPMI-7951 after 1 h treatment with a small molecule 
COT kinase inhibitor. ERK and MEK phosphorylation are quantified. 
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Figure 4 | COT-expressing B-RAF(V600E) cell lines exhibit resistance to 
allosteric MEK inhibitors a, CI-1040 Gl;9 in a panel of B-RAF(V600E) cell 
lines. Red bars, COT expression/amplification; blue bars, undetectable/non- 
amplified COT. b, MEK and ERK phosphorylation in lysates from indicated 
cell lines treated with DMSO or CI-1040 (concentration noted). ¢, Fold change 
(relative to MEK1) Glo of A375 ectopically expressing the indicated ORFs for 
PLX4720, RAF265, CI-1040 and AZD6244. d, ERK phosphorylation in A375 


MEK and ERK phosphorylation, providing additional evidence that 
COT contributes to MEK/ERK activation in these cells (Fig. 3i). 

We then considered whether COT-expressing cancer cells remain 
sensitive to MAPK pathway inhibition at a target downstream of COT 
or RAF. Here, we queried the OUMS-23 and RPMI-7951 cell lines for 
sensitivity to the MEK1/2 inhibitor CI-1040. Interestingly, both cell 
lines were refractory to MEK inhibition (Fig. 4a) and displayed sus- 
tained ERK phosphorylation even at 1 uM CI-1040 (Fig. 4b). Ectopic 
COT expression in A375 and SKMEL28 cells also conferred decreased 
sensitivity to the MEK inhibitors CI-1040 and AZD6244, suggesting 
that COT expression alone was sufficient to induce this phenotype 
(Fig. 4c, d and Supplementary Fig. 17). Similar to results observed with 
pharmacological MEK inhibitors, MEK1/2 knockdown only modestly 
suppressed COT-mediated ERK phosphorylation in A375 cells 
(Supplementary Fig. 18). In accordance with prior observations”, 
these data raised the possibility that COT may activate ERK through 
MEK-independent as well as MEK-dependent mechanisms. To test 
this hypothesis directly, we performed an in vitro kinase assay using 
recombinant COT and ERK1. Indeed, recombinant COT induced 
pThr 202/Tyr 204 phosphorylation of ERK1 in vitro (Supplementary 
Fig. 18), indicating that in certain contexts COT expression may 
potentiate ERK activation in a MEK-independent manner. 

In experimental models, the use of RAF and MEK inhibitors in com- 
bination can override resistance to single agents'*. We therefore reasoned 
that combined RAF/MEK inhibition might circumvent COT-driven res- 
istance. In the setting of ectopic COT expression, exposure to AZD6244 
or CI-1040 in combination with PLX470 (1 uM each) reduced cell growth 
and pERK expression more effectively than did single-agent PLX4720, 
even at concentrations of 10 1M (Fig. 4e, f and Supplementary Fig. 19). 
These data underscore the importance of this pathway in B-RAF(V600E) 
tumour cells and support earlier findings’* that dual B-RAF/MEK inhibi- 
tion may help circumvent resistance to RAF inhibitors. 

B-RAF mutations are found in ~8% of all cancers and at high 
frequencies in malignant melanoma, colon and thyroid cancers’. 


DO PLx4720 1 um 


LETTER 


& & 
od Horo g 


A375 RPMI-7951 


OUMS-23 


MEK1 MEK1°° MAP3K8 


RAF265 
Cl-1040 
AZD6244 
RAF265 
Cl-1040 
AZD6244 
DMSO 
PLX4720 
RAF265 
Cl-1040 
AZD6244 


Cl-1040 — — + 


expressing indicated ORFs following treatment with DMSO or 1 1M of 
PLX4720, RAF265, CI-1040 or AZD6244. e, Viability of A375 expressing the 
indicated ORFs and treated with DMSO, PLX4720 (concentration indicated) 
and PLX4720 in combination with CI-1040 or AZD6244 (all 1 uM). Error bars 
represent the standard deviation (n = 6). f, ERK phosphorylation in A375 
expressing indicated ORFs following treatment with DMSO, PLX4720 (1 uM) 
or PLX4720 in combination with CI-1040 or AZD6244 (all 1 [1M). 


The clinical promise of selective RAF inhibitors has widespread rami- 
fications for patient treatment, yet single-agent targeted therapy is 
almost invariably followed by relapse due to acquired drug resistance. 
Our results suggest that ORF-based, systematic functional screening 
may offer a powerful means to identify clinically relevant resistance 
mechanisms that also specify novel treatment strategies. In particular, 
resistance to RAF inhibition can be achieved by multiple MAP3K- 
dependent mechanisms of MEK/ERK reactivation but might be inter- 
cepted through combined therapeutic modalities for MAPK pathway 
inhibition (for example, RAF/MEK or RAF/COT combinations). 
Future systematic drug resistance studies may be expanded to a genome 
scale that encompasses many compounds, thereby enabling compre- 
hensive identification of both therapy-specific resistance genes and 
drug targets of novel therapeutics. 


METHODS SUMMARY 


The arrayed, lentiviral ORF screen was performed as described previously”’. 
Effects of individual ORFs on drug resistance were determined by measuring 
differential viability (ratio of raw viability in 1 1M PLX4720 over control) and 
subsequent normalization to an assay-specific positive control, MEK1??. 
Secondary screens were performed with the top nine candidate ORFs in 96-well 
format in A375 and SKMEL28 cells. Prioritization was accomplished via genera- 
tion of a Gls9 for each ORF across a multi-point PLX4720 concentration range in 
both cell lines. The effects of identified resistance ORFs on MAPK pathway activa- 
tion were demonstrated using both biochemical and cell biological approaches. 
Cell line copy number data was obtained as previously described’’. Detailed 
descriptions of all procedures are included in Methods. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Center for Cancer Systems Biology (CCSB)/Broad Institute Kinase Open 
Reading Frame Collection. We assembled a library of 597 kinase ORFs in 
pDONR-223 Entry vectors (Invitrogen). Individual clones were end-sequenced 
using vector-specific primers in both directions. Clones with substantial deviations 
from reported sequences were discarded. Entry clones and sequences are available 
via Addgene (http://www.addgene.org/human_kinases). Kinase ORFs were 
assembled from multiple sources; 337 kinases were isolated as single clones from 
the ORFeome 5.1 collection (http://horfdb.dfci.harvard.edu), 183 kinases were 
cloned from normal human tissue RNA (Ambion) by reverse transcription and 
subsequent PCR amplification to add Gateway sequences (Invitrogen), 64 kinases 
were cloned from templates provided by the Harvard Institute of Proteomics 
(HIP), and 13 kinases were cloned into the Gateway system from templates 
obtained from collaborating laboratories. The Gateway-compatible lentiviral 
vector pLX-Blast-V5 was created from the pLKO.1 backbone. LR Clonase enzymatic 
recombination reactions were performed to introduce the 597 kinases into pLX- 
Blast-V5 according to the manufacturer’s protocol (Invitrogen). 
High throughout ORF screening. A375 melanoma cells were plated in 384-well 
microtitre plates (500 cells per well). The following day, cells were spin-infected 
with the lentivirally-packaged kinase ORF library in the presence of 8 1g ml! 
polybrene. At 48h post-infection, media were replaced with standard growth 
media (two replicates), media containing 1 1M PLX4720 (two replicates, two time 
points) or media containing 10 pg ml ’ blasticidin (two replicates). After 4 and 
6 days, cell growth was assayed using Cell Titer-Glo (Promega) according to 
manufacturer instructions. The entire experiment was performed twice. 
Identification of candidate resistance ORFs. Raw luminescence values were 
imported into Microsoft Excel. Infection efficiency was determined by the per- 
centage of duplicate-averaged raw luminescence in blasticidin-selected cells rela- 
tive to non-selected cells. ORFs with an infection efficiency of less than 0.70 were 
excluded from further analysis along with any ORF having a standard deviation of 
>15,000 raw luminescence units between duplicates. To identify ORFs whose 
expression affects proliferation, we compared the duplicate-averaged raw lumin- 
escence of individual ORFs against the average and standard deviation of all 
control-treated cells via the z-score, or standard score, below, 

Paes 

o 

where x is the average raw luminescence of a given ORF, is the mean raw 
luminescence of all ORFs and ¢ is the standard deviation of the raw luminescence 
of all wells. Any individual ORF with a z-score > +2 or < —2 was annotated as 
affecting proliferation and removed from final analysis. Differential proliferation 
was determined by the percentage of duplicate-averaged raw luminescence values 
in PLX4720 (1 1M)-treated cells relative to untreated cells. Subsequently, differ- 
ential proliferation was normalized to the positive control for PLX4720 resistance, 
MEK1(S218/222D) (MEK1?”), with MEK1?” differential proliferation = 1.0. 
MEK1??-normalized differential proliferation for each individual ORF was aver- 
aged across two duplicate experiments, with two time points for each experiment 
(day 4 and day 6). A z-score was then generated, as described above, for average 
MEK1”-normalized differential proliferation. ORFs with a z-score of >2 were 
considered hits and were followed up in the secondary screen. 
ORF and shRNA expression. ORFs were expressed from pLX-Blast-V5 (lentiviral) 
or pWZL-Blast, pBABE-Puro or pBABE-zeocin (retroviral) expression plasmids. 
For lentiviral transduction, 293T cells were transfected with 1 g of pLX-Blast-V5- 
ORE or pLKO.1-shRNA, 900ng A8.9 (gag, pol) and 100ng VSV-G using 6 pl 
Fugeneé6 transfection reagent (Roche). Viral supernatant was harvested 72h post- 
transfection. Mammalian cells were infected at a 1:10-1:20 dilution of virus in 6-well 
plates in the presence of 5 pg ml polybrene and centrifuged at 2,250 r.p.m. (1,178g) 
for 1h at 37 °C. Twenty-four hours after infection blasticidin (pLX-Blast-V5, 10 pig 
ml‘) or puromycin (pLKO.1, 0.75 pgml') was added and cells were selected for 
48h. For retrovirus production, 293T cells were transfected with 1 jg of retroviral 
plasmid-ORF, 1 1g pCL-AMPHO and 100 ng VSV-G, as described above. Cells were 
infected with retrovirus containing supernatant at a 1:2 dilution in 5 ug ml! poly- 
brene overnight, followed by media change to growth media. Infection was repeated 
once more (twice total), followed by selection, above. 
Secondary screen. A375 (1.5 X 10°) and SKMEL28 cells (3 X 10°) were seeded in 
96-well plates for 18 h. ORF-expressing lentivirus was added at a 1:10 dilution in 
the presence of 8 igml | polybrene, and centrifuged at 2,250 r.p.m. (1,178g) and 
37°C for Lh. Following centrifugation, virus-containing media were changed to 
normal growth media and allowed to incubate for 18h. Twenty-four hours after 
infection, DMSO (1:1,000) or 10 PLX4720 (in DMSO) was added to a final 
concentration of 100, 10, 1, 0.1, 0.01, 0.001, 0.0001 or 0.00001 tM. Cell viability 
was assayed using WST-1 (Roche), per manufacturer recommendation, 4 days 
after the addition of PLX4720. 
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Cell lines and reagents. A375, SKMEL28, SKMEL30, COLO-679, WM451lu, 
SKMEL5, Malme 3M, SKMEL30, WM3627, WM1976, WM3163, WM3130, 
WM3629, WM3453, WM3682 and WM3702 were all grown in RPMI (Cellgro), 
10% FBS and 1% penicillin/streptomycin. M307 was grown in RPMI (Cellgro), 
10% FBS and 1% penicillin/streptomycin supplemented with 1 mM sodium pyr- 
uvate. 293T, OUMS-23 and RPMI-7951 cells (ATCC) were grown in MEM 
(Cellgro), 10% FBS and 1% penicillin/streptomycin. Wild-type primary melanocytes 
were grown in HAM’s F10 (Cellgro), 10% FBS and 1% penicillin/streptomycin. 
B-RAF(V600E)-expressing primary melanocytes were grown in TIVA media 
(Ham’s F-10 (Cellgro), 7% FBS, 1% penicillin/streptomycin, 2mM glutamine 
(Cellgro), 100M IBMX, 50ng ml! TPA (12-O-tetradecanoyl-phorbol-13- 
acetate), 1 mM 3’,5'-cyclic AMP dibutyrate (dbcAMP; Sigma) and 11M sodium 
vanadate). CI-1040 (PubChem ID: 6918454) was purchased from Shanghai Lechen 
International Trading Co., AZD6244 (PubChem ID: 10127622) from Selleck 
Chemicals, and PLX4720 (PubChem ID: 24180719) from Symansis. RAF265 
(PubChem ID: 11656518) was a generous gift from Novartis Pharma AG. Unless 
otherwise indicated, all drug treatments were for 16 h. Activated alleles of NRAS and 
KRAS have been described previously*”. 

Pharmacologic growth inhibition assays. Cultured cells were seeded into 96-well 
plates (3,000 cells per well) for all melanoma cell lines; 1,500 cells were seeded for 
A375. Twenty-four hours after seeding, serial dilutions of the relevant compound 
were prepared in DMSO added to cells, yielding final drug concentrations ranging 
from 100 LM to 1 X 10° uM, with the final volume of DMSO not exceeding 1%. 
Cells were incubated for 96 h following addition of drug. Cell viability was measured 
using the WST1 viability assay (Roche). Viability was calculated as a percentage of 
control (untreated cells) after background subtraction. A minimum of six replicates 
were performed for each cell line and drug combination. Data from growth-inhibition 
assays were modelled using a nonlinear regression curve fit with a sigmoid dose- 
response. These curves were displayed and Gl;9 generated using GraphPad Prism 5 
for Windows (GraphPad). Sigmoid-response curves that crossed the 50% inhibition 
point at or above 10 1M have Glso values annotated as >10 UM. For single-dose 
studies, the identical protocol was followed, using a single dose of indicated drug 
(1 uM unless otherwise noted). 

Immunoblots and immunoprecipitations. Cells were washed twice with ice- 
cold PBS and lysed with 1% NP-40 buffer (150 mM NaCl, 50mM Tris pH7.5, 
2mM EDTA pH 8, 25 mM NaF and 1% NP-40) containing 2 protease inhibitors 
(Roche) and 1X Phosphatase Inhibitor Cocktails I and II (CalBioChem). Lysates 
were quantified (Bradford assay), normalized, reduced, denatured (95°C) and 
resolved by SDS gel electrophoresis on 10% Tris/Glycine gels (Invitrogen). 
Protein was transferred to PVDF membranes and probed with primary antibodies 
recognizing pERK1/2 (T202/Y204), pMEK1/2 (S217/221), MEK1/2, MEK1, 
MEK2, C-RAF (rabbit host), pC-RAF (pS338) (Cell Signaling Technology; 
1:1,000), V5-HRP (HRP, horseradish peroxidase; Invitrogen; 1:5,000), COT 
(1:500), B-RAF (1:2,000), Actin (1:1,000), Actin-HRP (1:1,000; Santa Cruz)), 
C-RAF (mouse host; 1:1,000; BD Transduction Labs), Vinculin (Sigma; 
1:20,000), AXL (1:500; R&D Systems). After incubation with the appropriate 
secondary antibody (anti-rabbit, anti-mouse IgG, HRP-linked; 1:1,000 dilution, 
Cell Signaling Technology or anti-goat IgG, HRP-linked; 1:1,000 dilution; Santa 
Cruz), proteins were detected using chemiluminescence (Pierce). Immuno- 
precipitations were performed overnight at 4°C in 1% NP-40 lysis buffer, as 
described above, at a concentration of 1 pg pl! total protein using an antibody 
recognizing C-RAF (1:50; Cell Signaling Technology). Antibody: antigen com- 
plexes were bound to Protein A agarose (25 Ll, 50% slurry; Pierce) for 2h at 4 °C. 
Beads were centrifuged and washed three times in lysis buffer and eluted and 
denatured (95 °C) in 2 reduced sample buffer (Invitrogen). Immunoblots were 
performed as above. Phospho-protein quantification was performed using NIH 
ImageJ. 

Lysates from tumour and matched normal skin were generated by mechanical 
homogenization of tissue in RIPA (50mM Tris (pH7.4), 150mM NaCl, 1mM 
EDTA, 0.1% SDS, 1.0% NaDOC (sodium deoxycholate), 1.0% Triton X-100, 
25mM NaF, 1mM NA;VO,) containing protease and phosphatase inhibitors, 
as above. Subsequent normalization and immunoblots were performed as above. 
Biopsied melanoma tumour material. Biopsied tumour material consisted of 
discarded and de-identified tissue that was obtained with informed consent and 
characterized under protocol 02-017 (paired samples, Massachusetts General 
Hospital) and 07-087 (unpaired sample, Dana-Farber Cancer Institute). For 
paired specimens, ‘on-treatment’ samples were collected 10-14 days after ini- 
tiation of PLX4032 treatment (Supplementary Table 4). 

Inhibition of COT kinase activity. Adherent RPMI-7951 cells were washed twice 
with 1 PBS and incubated overnight in serum-free growth media. Subsequently, 
4-(3-chloro-4-fluorophenylamino)-6-(pyridin-3-yl-methylamino)-3-cyano-[1,7]- 
naphthyridine (EMD;TPL2 inhibitor I; catalogue number 616373, PubChem ID: 
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9549300), suspended in DMSO at the indicated concentration, was added to cells 
for 1h, after which protein extracts were made as described above. 

Quantitative RT-PCR. mRNA was extracted from cell lines and fresh-frozen 
tumours using the RNeasy kit (Qiagen).Total mRNA was used for subsequent 
reverse transcription using the SuperScript III First-Strand Synthesis SuperMix 
(Invitrogen) for cell lines and unpaired tumour samples, and the SuperScript 
VILO cDNA synthesis kit (Invitrogen) for paired frozen tumour samples. The 
reverse transcription reaction (5 pl) was used for quantitative PCR using SYBR 
Green PCR Master Mix and gene-specific primers, in triplicate, using an ABI 7300 
Real Time PCR System. Primers used for detection are as follows; COT forward: 
5'-CAAGTGAAGAGCCAGCAGTTT-3’; COT reverse: 5'-GCAAGCAAATCC 
TCCACAGTTC-3’; TBP forward: 5'-CCCGAAACGCCGAATATAATCC-3’; 
TBP reverse: 5'-GACTGTTCTTCACTCTTGGCTC-3'’; GAPDH forward: 
5'-CATCATCTCTGCCCCCTCT-3’; GAPDH reverse: 5'-GGTGCTAAGCAG 
TTGGTGGT-3’ . 

In vitro kinase assay. In vitro kinase assays were performed as described previ- 
ously’* using 1 jig each of COT (amino acids 30-397, R&D Systems) and inactive 
ERK1 (Millipore). 

Cellular viability assays. Adherent RPMI-7951 cells were infected with virus 
expressing shRNAs against COT or Luciferase as described above. Following 
selection, cells were plated (1.5 X 10° cells per well) onto a 24-well plate in quad- 
ruplicate. Viable cells were counted via trypan blue exclusion using a VI-CELL Cell 
Viability Analyser, following manufacturer’s specifications. Quadruplicate cell 
counts were averaged and normalized relative to that of the control shRNA. 
The Cancer Cell Line Encyclopedia (CCLE). The Cancer Cell Line Encyclopedia 
(CCLE) project is a collaboration between the Broad Institute, the Novartis 
Institutes for Biomedical Research (NIBR) and the Genomics Institute of the 
Novartis Research Foundation (GNF) to conduct a detailed genetic and pharma- 
cologic characterization of a large panel of human cancer models, to develop 


integrated computational analyses that link distinct pharmacologic vulnerabilities 
to genomic patterns and to translate cell line integrative genomics into cancer 
patient stratification. Chromosomal copy number and gene expression data used 
for this study are available online at http://www.broadinstitute.org/cgi-bin/cancer/ 
datasets.cgi. 

Expression profiling of cancer cell lines. We carried out oligonucleotide micro- 
array analysis using the GeneChip Human Genome U133 Plus 2.0 Affymetrix 
expression array (Affymetrix). Samples were converted to labelled, fragmented, 
cRNA following the Affymetrix protocol for use on the expression microarray. 
shRNA constructs used (pLKO.1). The shRNA constructs used were shLuc 
(TRCN0000072243, | 5’-CTTCGAAATGTCCGTTCGGTT-3’), — shBRAF(1) 
(TRCN0000006289, NM_004333.2-1106slcl, 5’-CTTCGAAATGTCCGTTCG 
GTT-3’), shBRAF(2) (TRCN0000006291, NM_004333.2-2267sIcl, 5'-GCTGG 
TITCCAAACAGAGGAT-3’), shCRAF(1) (TRCN0000001066, NM_002880.x- 
1236slcl, 5'-CGGAGATGTTGCAGTAAAGAT-3’), shCRAF(2) (TRCN0000001068, 
NM_002880.x-1529sIcl, 5’-GAGACATGAAATCCAACAATA-3’), shMEK1(1) 
(TRCN0000002332, NM_002755.x-1015slcl, 5’-GATTACATAGTCAACGAG 
CCT-3’), shMEK1(2) (TRCN0000002329, NM_002755.x-455sIcl, 5’-GCTTC 
TATGGTGCGTTCTACA-3’), shMEK2(1) (TRCN0000007007, NM_030662.2- 
1219slcl, 5'-TGGACTATATTGTGAACGAGC-3’), shMEK2(2) (TRCN0000007005, 
NM_030662.2-847slcl, 5'-CCAACATCCTCGTGAACTCTA-3’), shCOT(1) 
(TRCN0000010013, NM_005204.x-1826s1cl, 5’-CAAGAGCCGCAGACCTAC 
TAA-3’) and shCOT(2) (TRCN0000196518, NM_005204.2-2809s1cl, 5’-GAT 
GAGAATGTGACCTTTAAG-3’). 


28. Boehm, J. S. et al. Integrative genomic approaches identify /KBKE as a breast 
cancer oncogene. Cel! 129, 1065-1079 (2007). 
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Melanomas acquire resistance to B-RAF(V600E) 
inhibition by RTK or N-RAS upregulation 


Ramin Nazarian’*, Hubing Shi'**, Qi Wang’, Xiangju Kong’, Richard C. Koya’, Hane Lee”, Zugen Chen?"*, Mi-Kyung Lee”, 
Narsis Attar”°, Hooman Sazegar™, Thinle Chodon””, Stanley F. Nelson?**, Grant McArthur’, J effrey A. Sosman®, 


Antoni Ribas**> & Roger S. Lob? 


Activating B-RAF(V600E) (also known as BRAF) kinase mutations 
occur in ~7% of human malignancies and ~60% of melanomas’. 
Early clinical experience with a novel class I RAF-selective inhibitor, 
PLX4032, demonstrated an unprecedented 80% anti-tumour res- 
ponse rate among patients with B-RAF(V600E)-positive melano- 
mas, but acquired drug resistance frequently develops after initial 
responses’. Hypotheses for mechanisms of acquired resistance to 
B-RAF inhibition include secondary mutations in B-RAF(V600E), 
MAPK reactivation, and activation of alternative survival path- 
ways” °. Here we show that acquired resistance to PLX4032 develops 
by mutually exclusive PDGFR6 (also known as PDGFRB) upregula- 
tion or N-RAS (also known as NRAS) mutations but not through 
secondary mutations in B-RAF(V600E). We used PLX4032-resistant 
sub-lines artificially derived from B-RAF(V600E)-positive mela- 
noma cell lines and validated key findings in PLX4032-resistant 
tumours and tumour-matched, short-term cultures from clinical 
trial patients. Induction of PDGFRB RNA, protein and tyrosine 
phosphorylation emerged as a dominant feature of acquired 
PLX4032 resistance in a subset of melanoma sub-lines, patient- 
derived biopsies and short-term cultures. PDGFRf-upregulated 
tumour cells have low activated RAS levels and, when treated with 
PLX4032, do not reactivate the MAPK pathway significantly. In 
another subset, high levels of activated N-RAS resulting from muta- 
tions lead to significant MAPK pathway reactivation upon PLX4032 
treatment. Knockdown of PDGFR/or N-RAS reduced growth of the 
respective PLX4032-resistant subsets. Overexpression of PDGFR6 or 
N-RAS(Q61K) conferred PLX4032 resistance to PLX4032-sensitive 
parental cell lines. Importantly, MAPK reactivation predicts MEK 
inhibitor sensitivity. Thus, melanomas escape B-RAF(V600E) tar- 
geting not through secondary B-RAF(V600E) mutations but via 
receptor tyrosine kinase (RTK)-mediated activation of alternative 
survival pathway(s) or activated RAS-mediated reactivation of the 
MAPK pathway, suggesting additional therapeutic strategies. 

We selected three B-RAF(V600E)-positive parental (P) cell lines, 
M229, M238 and M249, exquisitely sensitive to PLX4032-mediated 
growth inhibition in vitro and in vivo®, and derived PLX4032-resistant 
(R) sub-lines by chronic PLX4032 exposure. In cell survival assays, 
M229 R, M238 R and M249 R sub-lines displayed strong resistance 
to PLX4032 (GlIso, the concentration of drug that inhibits growth of 
cells by 50%, not reached up to 10 1M) and paradoxically enhanced 
growth at low PLX4032 concentrations, in contrast to parental cells 
(Supplementary Fig. 1a). Morphologically, both M229 R and M238 R 
sub-lines appear flatter and more fibroblast-like compared to their 
parental counterparts, but this morphologic switch was not seen in 
the M249 P versus M249 R4 pair (Supplementary Fig. 2a). 


There were no secondary mutations in the drug target B-RAF (V600E) 
observed on bi-directional Sanger sequencing of all 18 B-RAF exons in 
15 M229 R (R1-R15), two M238 R (RI and R2), and one M249 R (R4) 
acquired resistant sub-lines (Supplementary Table 1 and Supplemen- 
tary Fig. 3a, left column). Based on Sanger sequencing, this lack of 
secondary B-RAF (V600E) mutation along with retention of the ori- 
ginal B-RAF(V600E) mutation was confirmed in 16/16 melanoma 
tumour biopsies (from 12 patients) with clinically acquired resistance 
to PLX4032 (that is, initial >30% tumour size decrease or partial 
response, as defined by RECIST (response evaluation criteria in solid 
tumours) and subsequent progression on PLX4032 dosing; see exam- 
ples in Supplementary Fig. 4) and 5/5 short-term melanoma cultures 
established from 5 resistant tumours obtained from 4 patients 
(Supplementary Table 2). Given recent reports of B-RAF-selective 
inhibitors having a growth-promoting effect on B-RAF wild-type 
tumour cells’”’, retention of the original B-RAF alleles in PLX4032- 
resistant sub-lines, tissues and cultures indicates that PLX4032 chronic 
treatment did not select for the outgrowth of a pre-existing, minor 
B-RAF wild-type sub-population. Furthermore, immunoprecipitated 
B-RAF kinase activities from resistant sub-lines and short-term cul- 
tures were similarly sensitive to PLX4032 as B-RAF kinase activities 
immunoprecipitated from parental cell lines (Supplementary Fig. 3b; 
Pt48 R and Pt55 R resistance to PLX4032 (ref. 10) and the pre-clinical 
analogue PLX4720 (ref. 11) shown in Supplementary Fig. 5a and b, 
respectively; Pt, patient). These results demonstrate that, in all tested 
acquired resistant cell lines and cultures, the mutated B-RAF(V600E) 
kinase lacks secondary mutations and hence retains its ability to 
respond to PLX4032. 

Given that minority PLX4032-resistant sub-populations in tissues 
may acquire B-RAF(V600E) secondary mutations not detectable by 
Sanger sequencing, we analysed “ultra-deep” (Supplementary Fig. 6) 
and deep (Supplementary Fig. 7) sequences of B-RAF (exons 2-18) 
using the Illumina platform for 9/11 acquired resistant tumour samples 
without tumour-matched short-term cultures (one sample, Pt111-010 
DP2, intentionally analysed by both methods; DP, disease progression). 
Ultradeep B-RAF sequencing of five PLX4032-resistant melanoma 
tissues resulted in every base of exons 2-18 being sequenced at a median 
coverage of 127X (27-128) (Supplementary Fig. 6a and b). The 
known variant, V600E, was detected in all five samples with signifi- 
cantly high non-reference allele frequencies (NAF) (Supplementary 
Fig. 6c). In all five tissues, exon 13, where the T529 gatekeeper residue” 
is located, was independently amplified and uniquely bar-coded twice. 
Rare variants (none at the T529 codon; Supplementary Fig. 6d) 
detected in these independent exon 13 analyses do not overlap and 
helped define the true, signal NAF at >4.81% (Supplementary 
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Methods). Furthermore, deep B-RAF (exons 2-18) sequence analysis of 
PLX4032-resistant melanoma tissues from a whole exome sequencing 
project resulted in 2,396 base pairs of B-RAF coding regions having 
coverage = 10X (average coverage per exon in each tissue shown in 
Supplementary Fig. 7a). After filtering, no position harboured a variant 
with a NAF >4.81%, except for the known V600E mutation in all five 
resistant samples. Together, these data strongly corroborate the lack of 
B-RAF(V600E) secondary mutations during the evolution of PLX4032 
acquired resistance in the majority of patients and their tumours. 

To begin to understand PLX4032-resistance in vitro, we used phospho- 
specific antibodies to probe the activation status of the RAF downstream 
effectors, MEK1/2 and ERK1/2 (also know as MAP2K1/2 and MAPK3/ 
1, respectively), in parental versus resistant sub-lines, with and without 
PLX4032 (Fig. la). As expected, PLX4032 induced dose-dependent 
decreases in p-MEK1/2 and p-ERK1/2 in all parental cells. However, 
the pattern of MEK-ERK sensitivity to PLX4032 varied among resistant 
sub-lines, suggesting distinct mechanisms. In contrast to M249 R4, 
which showed strong resistance to PLX4032-induced MEK/ERK inhibi- 
tion (suggesting MAPK reactivation), M229 R5 and M238 R1 were both 
similarly sensitive to PLX4032-induced decreases in the levels of 
p-MEK1/2 and p-ERK1/2. Gene expression profiling (Fig. 1b) further 
supported distinct PLX4032 acquired resistant mechanisms represented 
by M229 R5/M238 R1 versus M249 R4. We first used the gene expres- 
sion alterations responsive to PLX4032 in parental cells to define a 
B-RAF(V600E)-responsive gene signature, which is similar to gene sets 
defined by a MEK1 inhibitor (PD325901)”* and by PLX4720 (ref. 14; 
Supplementary Fig. 8). Concordant with the western blot results (Fig. la), 
M249 R4 demonstrated striking resistance to PLX4032 treatment with 
a gene signature of persistent MEK-ERK activation, whereas both 
M229 R5 and M238 RI retained a PLX4032-sensitive gene signature 
(Fig. 1b). These data confirm that M229 R5 and M238 R1 share key 
characteristics of resistance, which are in line with unsupervised clus- 
tering of these two resistant sub-lines in genome-wide, differential 
expression patterns (Supplementary Fig. 9). 
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Gene set enrichment analysis demonstrated an enrichment of RTK- 
controlled signalling in M229 R5 and M238 R1 but exclusive of M249 R4 
(Supplementary Table 3). Unsupervised clustering of the receptor 
tyrosine kinome gene expression profiles showed that M229 R5 and 
M238 RI clustered away from M229 and M238 parental cell lines largely 
based on higher expression levels of KIT, MET, EGFR and PDGFRf 
(Supplementary Fig. 10a, yellow highlight). RNA upregulation of these 
four RTKs was consistently not associated with genomic DNA (gDNA) 
copy number gain (Supplementary Fig. 10b). Of these four candidate 
RTKs, EGFR and PDGFR§ protein levels were overexpressed (Fig. 2a, 
left; Fig. 3b; Supplementary Fig. 10c), but only PDGFR® displayed ele- 
vated activation-associated tyrosine phosphorylation in a phospho-RTK 
array (Fig. 2a, right). PDGFRB RNA upregulation was a common feature 
among additional M229 R and M238 R sub-lines (Supplementary Fig. 
11a) but could not be observed in any of ten randomly selected parental 
melanoma cell lines (Supplementary Fig. 11b). Interestingly, tyrosine 
phosphorylation of PDGFR correlated with an upregulation of a gene 
signature unique to PDGFR® (ref. 15; Supplementary Fig. 12) but is not 
due to mutational activation, as PDGFRB cDNAs derived from M229 
R5, M238 RI and Pt48 R are wild type (Supplementary Table 1). 

We then validated our in vitro finding in vivo by studying clinical 
trial patient-derived samples (Supplementary Table 2; Fig. 2b) and 
tumour-matched short-term cultures (Fig. 2c and d). In 4/11 available, 
paired biopsy specimens, the resistant tumours showed a tumour- 
associated overexpression of PDGFRB compared to the baseline 
tumour in the same patients (Fig. 2b; Supplementary Table 2 and 
Supplementary Fig. 13). PDGFRB-positive areas of tissue sections were 
consistently strongly positive for S100 or MART1 (melanoma markers; 
MART1 is also known as MLANA ) but lacked CD31 (an endothelial, 
platelet, macrophage marker, also known as PECAM1 ) staining (data 
not shown). We were able to validate this finding further in an available 
short-term culture (Pt48 R) derived from a PLX4032-resistant, 
PDGFR6-positive tumour. Pt48 R was established from an intracardiac 
mass progressing 6 months after initiating treatment with PLX4032. 


Figure 1 | In vitro models of PLX4032 acquired 
resistance display differential MAPK 
reactivation. a, Parental and PLX4032-resistant 
sub-lines were treated with increasing PLX4032 
concentration (0, 0.01, 0.1, 1 and 10 11M), and the 
effects on MAPK signalling were determined by 
immunoblotting for p-MEK1/2 and p-ERK1/2 
levels. Total MEK1/2, ERK1/2 and tubulin levels, 
loading controls. b, Heat map for B-RAF(V600E) 
signature genes in each of the cell lines treated with 
DMSO or PLX4032. Colour scale, log,- 
transformed expression (red, high; green, low) for 
each gene (row) normalized by the mean of all 
samples. Blue box showing M249 R4 MAPK 
reactivation. Yellow box showing diminished, 
baseline expression of B-RAF(V600E) signature 
genes in M229 and M238 resistant sub-lines 
(FDR < 0.05). The probeset number is shown after 
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Figure 2 | PDGFRf upregulation is strongly 
correlated with PLX4032 acquired resistance. 

a, Left, total levels of PDGFRB and EGFR. A431, an 
EGFR-amplified cell line. Tubulin levels, loading 
control. Right, whole-cell extracts were incubated 
on the RTK antibody arrays, and phosphorylation 
status was determined by subsequent incubation 
with anti-phosphotyrosine horseradish peroxidase 
(each RTK spotted in duplicate, positive controls in 
corners, gene identity below). b, Anti-PDGFRB 
immunohistochemistry of formalin-fixed, 
paraffin-embedded tissues. Prostate, negative 
control; placenta, positive control. Black bar, 

50 um. ¢, Relative RNA levels of PDGFRf in M229 
P/R5 and Pt48 R as determined by real-time, 
quantitative PCR (average of duplicates). d, Total 
PDGER§ (left) and p-RTK (right) levels in Pt48 R 
versus M229 R5. 
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The Pt48 R short-term culture demonstrated clear overexpression of 
PDGERP RNA (Fig. 2c), protein and p-Tyr levels (Fig. 2d). 

In M249 R4, we sequenced all exons of N-RAS, K-RAS (also known 
as KRAS) or H-RAS (also known as HRAS) (to include codons 12, 13, 
and 61 as well as mutational hotspots of emerging significance’®) and 
MEK! (ref. 17; Supplementary Table 1 and data not shown) because 
we proposed a resistance mechanism reactivating MAPK despite not 
having a secondary B-RAF mutation. Interestingly, M249 R4 harbours 
a N-RAS(Q61K) activating mutation not present in the parental M249 
cell line (Fig. 3a). We found N-RAS mutations in 2/16 acquired resist- 
ant biopsy samples (note that both came from Pt55; Supplementary 
Table 2 and Supplementary Fig. 14). A N-RAS(Q61K) mutated sample, 
Pt55 DP1 (for disease progression 1) was obtained from a biopsy taken 
from an isolated, nodal metastasis that partially regressed on PLX4032 
but increased in size 10 months after starting on therapy with PLX4032 


Patient 92 


Resistant 


M229 R5 


Pt48 R 


(Supplementary Fig. 4a). This patient continued on therapy with 
PLX4032 until 6 months later, when several other nodal metastases 
developed (Supplementary Fig. 14a, b). Analysis of a biopsy taken at a 
second progression site (Pt55 DP2) demonstrated a different mutation 
in N-RAS, N-RAS(Q61R) (Supplementary Fig. 14b). Both Pt55 DP1 
and DP2 tissue N-RAS mutations were confirmed in their respective 
short-term cultures, Pt55 R and Pt55 R2 (Fig. 3a and Supplementary 
Fig. 14b). Also, both DP1 and DP2 (and their respective cultures) 
harboured increased N-RAS gDNA copy numbers (Supplementary 
Fig. 14c and d). Both Pt55 R and Pt55 R2 also showed increased 
N-RAS RNA (Supplementary Fig. 14e) and protein levels (Fig. 3b). 
In addition, N-RAS(Q61K) mutation in M249 R4 and Pt55 R corre- 
lated with a marked increase in activated N-RAS levels (Fig. 3b). Of 
note, the N-RAS mutations were mutually exclusive with PDGFRB 
overexpression in all samples (Supplementary Table 2). 
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Knockdown of PDGFRf or N-RAS using small interfering RNA 
(siRNA) pools preferentially growth-inhibited melanoma cells with 
upregulated PDGFRB or N-RAS, respectively (Supplementary Fig. 15a, 
b and Supplementary Table 4). We then selected two resistant sub-lines 
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Figure 3 | N-RAS upregulation correlates with a distinct subset of PLX4032 
acquired resistance. a, Detection of a N-RAS(Q61K) allele in M249 R4 and 
Pt55 R. b, The levels of activated RAS (aRAS) and N-RAS (aN-RAS) eluted after 
pull-down using the RAS-binding domain (RBD) of RAF-1. The total levels of 
RAS, N-RAS, PDGFR§ and tubulin (loading control) from the whole-cell 
lysates are shown by immunoblotting. Effects of GDP and GTPYS pre- 
incubation on RBD pull-down and beads without RBD pull-down from Pt48 R 
lysates are shown as controls. 


or cultures to test the effects of individual PDGFRf and N-RAS short 
hairpin RNAs (shRNAs; Fig. 4a and b, respectively). Stable knockdown 
of PDGFRf caused an admixture of GO/G1 cell cycle arrest (ina MEK 
inhibitor-dependent manner due to compensatory signalling; 
Supplementary Fig. 16a and data not shown) and apoptosis in M229 
R5 anda G0/GI cell cycle arrest in M238 R1. This effect was specific, as 
stable PDGFRP knockdown in M249 R4 and Pt55 R did not result in 
G0/G1 cell cycle arrest (Supplementary Fig. 17a). In contrast, stable 
N-RAS knockdown resulted in a predominantly apoptotic response in 
M249 R4 and Pt55 R (Fig. 4b) but not in M229 R5, M238 R1 or Pt48 R 
(Supplementary Fig. 17b). Moreover, stable N-RAS knockdown markedly 
conferred PLX4032 sensitivity to M249 R4 and Pt55 R but had no effect 
on M229 R5 PLX4032 resistance (Supplementary Fig. 18a). Flag—N- 
RAS(Q61K) stable overexpression conferred PLX4032 resistance in 
the M249 parental cell line (Supplementary Fig. 18b), whereas stable 
PDGFRB-MYC overexpression conferred reduced PLX4032 sensitivity 
in both M229 and M238 parental cell lines (Supplementary Fig. 19). 
We then asked whether N-RAS-dependent growth and reactivation 
of the MAPK pathway (Fig. la and Supplementary Fig. 20) would 
selectively sensitize M249 R4 and Pt55 R to MEK inhibition. Indeed, 
whereas the growth of M229 R5, M238 R1 and Pt48 R was uniformly 
highly resistant to the MEK inhibitor AZD6244 (and U0126, Sup- 
plementary Fig. 21), the growth of M249 R4 and Pt55 R was sensitive 
to MEK inhibition in the presence of PLX4032 (Fig. 4c) or absence of 
PLX4032 (Supplementary Fig. 22). It is known that activated N-RAS in 
melanoma cells uses C-RAF (also known as RAF1) over B-RAF to 


Figure 4 | PDGFRP- and N-RAS-mediated 
growth and survival pathways differentially 
predict MEK inhibitor sensitivity. 
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signal to MEK-ERK"’. Thus, N-RAS activation would be capable of 
bypassing PLX4032-inhibited B-RAF, reactivating the MAPK path- 
way. It is worth noting that PDGFRB-upregulated, PLX4032-resistant 
melanoma sub-lines (M229 R5 and M238 R1) and culture (Pt48 R) are 
resistant not only to AZD6244 but also to imatinib, which is at least 
partially due to rebound, compensatory survival signalling (Supplemen- 
tary Fig. 23 and unpublished observations, H.S. and R.S.L.). 

We propose (Supplementary Fig. 1b) that B-RAF(V600E)-positive 
melanomas, instead of accumulating B-RAF(V600E) secondary muta- 
tions, can acquire PLX4032 resistance by (1) activating an RTK 
(PDGFRf)-dependent survival pathway in addition to MAPK, or (2) 
reactivating the MAPK pathway via N-RAS upregulation. These two 
mechanisms account for acquired PLX4032 resistance in 5/12 patients 
in our study cohort, and additional mechanisms await future discovery. 
Some patients who relapse on PLX4032 are already being enrolled in a 
phase II MEK inhibitor trial (ClinicalTrials.gov identifier NCT01037127) 
based on the assumption of MAPK reactivation. Our findings directly 
imply a strategy to stratify patients who relapse on PLX4032 and should 
prompt a search for rational combinations of targeting agents most 
optimal for distinct mechanisms of acquired resistance to PLX4032 as 
well as other B-RAF inhibitors (for example, GSK2118436) in clinical 
development. 


METHODS SUMMARY 

Cell culture, infections and compounds. Cells were maintained in Dulbecco's 
modified Eagle medium (DMEM) with 10 or 20% fetal bovine serum and glutamine. 
shRNAs were sub-cloned into the lentiviral vector pLL3.7 and infections carried out 
with protamine sulphate. Stocks of PLX4032 (Plexxikon) and AZD6244 (commercially 
available) were made in DMSO. Cells were quantified using CellTiter-GLO 
Luminescence (Promega). 

Protein detection. Western blots were probed with antibodies against p-MEK1/2 
(S217/221), MEK1/2, p-ERK1/2 (T202/Y204), ERK1/2, PDGFRB, and EGFR (Cell 
Signaling Technologies), and N-RAS (Santa Cruz Biotechnology), pan-RAS (Thermo 
Scientific) and tubulin (Sigma). p-RTK arrays were performed according to the man- 
ufacturer’s recommendations (Human Phospho-RTK Array Kit, R&D Systems). For 
PDGER§ immunohistochemistry, paraffin-embedded formalin-fixed tissue sections 
were antigen-retrieved, incubated with a PDGFR§ antibody followed by horseradish 
peroxidase-conjugated secondary antibody (Envision System, DakoCytomation). 
Immunocomplexes were visualized using the DAB (3,3’-diaminobenzidine) 
peroxidase method and nuclei haematoxylin-counterstained. For activated RAS 
pull-down, lysates were incubated with beads coupled to glutathione-S-transferase 
(GST)-RAF-1-RAS-binding domain of RAF1 (RBD) (Thermo) for 1h at 4°C. 
RNA quantifications. For real-time quantitative PCR, total RNA was extracted 
and cDNA quantified. Data were normalized to tubulin and GAPDH levels. 
Relative expression is calculated using the delta-Ct method. For RNA expression 
profiling, total RNAs were extracted, and generated cDNAs were fragmented, 
labelled and hybridized to the GeneChip Human Gene 1.0 ST Arrays 
(Affymetrix). Expression data were normalized, background-corrected, and 
log,-transformed for parametric analysis. Differentially expressed genes were 
identified using significance analysis of microarrays (SAM) with the R package 
‘samr’ (false discovery rate (FDR) < 0.05; fold change > 2). 

Cell cycle and apoptosis. For cell cycle analysis, cells were fixed, permeabilized 
and stained with propidium iodide (BD Pharmingen). Cell cycle distribution was 
analysed by Cell Quest Pro and ModiFit software. For apoptosis, cells were co- 
stained with Annexin V-V450 and propidium iodide (BD Pharmingen). Data were 
analysed with the FACS Express V2 software. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

Cell culture, lentiviral constructs and infections. All cell lines were maintained 
in DMEM with 10% or 20% (short-term cultures) heat-inactivated FBS (Omega 
Scientific) and 2 mmol1”! glutamine in humidified, 5% CO, incubator. To derive 
PLX4032-resistant sub-lines, M229 and M238 were seeded at low cell density and 
treated with PLX4032 at 1 uM every 3 days for 4-6 weeks and clonal colonies were 
then isolated by cylinders. M249 R was derived by successive titration of PLX4032 
up to 10 1.M. PLX4032-resistant sub-lines and short-term cultures were replenished 
with 1 juM PLX4032 every 2 to 3 days. shRNAs were sub-cloned into the lentiviral 
vector pLL3.7. N-RAS(Q61K) mutant overexpression construct was made by PCR- 
amplifying from M249 R4 cDNA and sub-cloning into the lentiviral vector (UCLA 
Vector Core), creating pRRLsin.cPPT.CMV.hTERT.IRES.GEP-Flag-2°'“NRAS. 
Wild-type PDGFRf overexpression construct was PCR-amplified from cDNA 
and sub-cloned into a lentiviral vector (Clontech), creating pLVX-Tight-Puro- 
PDGER§-Myc. Lentiviral constructs were co-transfected with three packaging plas- 
mids into HEK293T cells. Infections were carried out with protamine sulphate. 
Cellular proliferation, drug treatments and siRNA transfections. Cell prolifera- 
tion experiments were performed in a 96-well format (five replicates), and baseline 
quantification performed at 24h after cell seeding along with initiation of drug 
treatments (72 h). Stocks and dilutions of PLX4032 (Plexxikon), AZD6244 (Selleck 
Chemicals)and U0126 (Promega) were made in DMSO. siRNA pool (Dharmacon) 
transfections were carried out in 384-well format. TransIT transfection reagent 
(Mirus) was added to each well and incubated at 37 °C for 20 min. Subsequently, 
cells were reverse transfected, and the mixture was incubated for 51-61 h at 37 °C. 
Cells were quantified using CellTiter 96 Aqueous One Solution (Promega) or 
CellTiter-GLO Luminescence (Promega) following the manufacturer’s recom- 
mendations. 

Protein detection. Cell lysates for western blotting were made in RIPA (Sigma) 
with protease inhibitor cocktail (Roche) and phosphatase inhibitor cocktails I and 
II (Santa Cruz Biotechnology). Western blots were probed with antibodies against 
p-MEK1/2 (S217/221), total MEK1/2, p-ERK1/2 (T202/Y204), total ERK1/2, 
PDGER§, and EGER (all from Cell Signaling Technologies), B-RAF and N-RAS 
(Santa Cruz Biotechnology), pan-RAS (Thermo Scientific) and tubulin (Sigma). 
p-RTK arrays were performed according to the manufacturer’s recommendations 
(Human Phospho-RTK Array Kit, R&D Systems). For PDGFRB immunohisto- 
chemistry, paraffin-embedded formalin fixed tissue sections were subjected to 
antigen retrieval and incubated with a rabbit monoclonal anti-PDGFR# antibody 
(Cell Signaling Technology) followed by labelled anti-rabbit polymer horseradish 
peroxidase (Envision System, Dako Cytomation). Immunocomplexes were visua- 
lized using the DAB (3,3'-diaminobenzidine) peroxidase method and nuclei 
haematoxylin-counterstained. 

In vitro kinase assay. Cells were harvested and protein lysates prepared in a 
NP40-based buffer before subjected to immunoprecipitation (IP). IP beads were 
then resuspended in ADBI buffer (with Mg/ATP cocktail) and incubated with an 
inactive, recombinant MEK] or a truncated RAF-1 (positive control) (Millipore), 
and with DMSO or 1 uM PLX4032 for 30 min at 30°C. The beads were subse- 
quently pelleted and the supernatant resuspended in sample buffer for western 
blotting to detect p-MEK and total MEK. 

Activated RAS pull-down assay. Melanoma lysates were incubated with glu- 
tathione agarose beads coupled to 80 jig GST-RAF-1-RBD (Thermo) for 1 h at 
4°C. As controls, Pt48 R lysate was pre-incubated with either 0.1 mM GTPyS 
(positive control) or 1mM GDP (negative control) in the presence of 10 mM 
EDTA (pH8.0) at 30°C for 15min. Reactions were terminated by adding 
60mM MgCl. After washing with Wash Buffer (Thermo), proteins bound to 
beads were eluted by protein sample buffer. RAS or NRAS levels were detected 
by immunoblotting. 

Quantitative real-time PCR for relative RNA levels. Total RNA was extracted 
using the RiboPure Kit (Ambion), and reverse transcription reactions were per- 
formed using the SuperScript First-Strand Synthesis System (Invitrogen). Real- 
time PCR analyses were performed using the iCycler iQ Real Time PCR Detection 
System (BioRad) (Supplementary Table 5). To discriminate specific from non- 
specific cDNA products, a melting curve was obtained at the end of each run. Data 
were normalized to tubulin and/or GAPDH levels in the samples in duplicates. 
Relative expression is calculated using the delta-Ct method using the following 
equations: ACt(Sample) = Ct(Target) — Ct(Reference); relative quantity = pA 
Quantitative real-time PCR for relative DNA copy numbers. gDNAs were 
extracted using the FlexiGene DNA Kit (Qiagen) (Human Genomic DNA- 
Female, Promega). NRAS relative copy number was determined by quantitative 
PCR (cycle conditions available upon request) using the MyiQ single colour Real- 
Time PCR Detection System (Bio-Rad). Total DNA content was estimated by 
assaying B-globin for each sample (Supplementary Table 5), and 20 ng of gDNA 
was mixed with the SYBR Green QPCR Master Mix (Bio-Rad) and 2 pmol1”' of 
each primer. 


Sequencing. gDNAs were isolated using the Flexi Gene DNA Kit (QIAGEN) or 
the QlAamp DNA FFPE Tissue Kit. B-RAF and RAS genes were amplified from 
genomic DNA by PCR. PCR products were purified using QIAquick PCR 
Purification Kit (QIAGEN) followed by bi-directional sequencing using BigDye 
v1.1 (Applied Biosystems) in combination with a 3730 DNA Analyzer (Applied 
Biosystems). PDGFR/ was amplified from cDNA by PCR and sequenced (primers 
listed in Supplementary Table 1). 

B-RAF ultra-deep sequencing. Exon-based amplicons were generated using 
Platinum high-fidelity Taq polymerase, and libraries were prepared following 
the Illumina library generation protocol version 2.3. For each sample, one library 
was generated with 18 exons pooled at equal molarity and another library was 
generated for exon 13 only for validation purpose. Each library was indexed with 
an unique four base long barcode within the custom made Illumina adaptor. All 10 
indexed samples were pooled and sequenced on one lane of Illumina GAIIx flow- 
cell for single-end 76 base pairs. For error rate estimation, phiX174 genome was 
spiked in. Base-calling was performed by Illumina RTA version 1.8.70. Alignment 
was performed using the Novocraft Short Read Alignment Package version 2.06 
(http://www.novocraft.com/index.html). First, all reads were aligned to the 
phiX174 reference genome downloaded from the NCBI. The mismatch rates at 
each position of the reads were calculated to estimate the error rate of the sequencer 
(set at 1.67% or five standard deviations, SD) based on the phiX genome data 
(mean error rate = 0.57%, s.d. = 0.22%). Then, the .qseq.txt files were converted 
into .fastq file using a custom script (available on request) and during this process, 
the first 5 bases (unique 4-base barcode and the T at the fifth position) were 
stripped off from the reads and concatenated to the read name. The -fastq file 
was parsed into 10 .fastq files for each barcode and only the reads with the first 
5 bases perfectly matching any of the 10 barcodes were included. Each .fastq file 
was aligned to chromosome 7 fasta file, generated from the Human Genome 
reference sequence (hg18, March 2006, build 36.1) downloaded from the Broad 
Institute (ftp://ftp.broadinstitute.org/pub/gsa/gatk_resources.tgz) using the Novoalign 
program. Base calibration option was used, and the output format was set to SAM. 
Using SAMtools (http://samtools.sourceforge.net/), the .sam files of each lane were 
converted to .bam files and sorted, followed by removal of potential PCR duplicates 
using Picard (http://picard.sourceforge.net/). The true background rate was 
inferred from analysis of independent exon 13 amplicons. None of the 14 positions 
within exon 13 that had non-reference allele frequency (NAF) > 1.67% in all- 
exon-samples were validated in the exon13-only samples and vice versa for the 
one position in the exon 13-only sample, inferring that the true background error 
rate could be higher at 4.81% (5s.d., mean error rate = 2.72%, s.d. = 0.4%). In total, 
12 positions had NAF > 4.81%, and none of them recurred at the same position. 
We note that the four sample gDNAs extracted from formalin-fixed paraffin- 
embedded (FFPE) blocks had 5-6 times more variants with NAF above back- 
ground than the sample extracted from frozen tissue, and the 12 positions with 
NAF >4.81% were scattered only across the FFPE samples. The numbers of 
variants within and outside the kinase domain were not significantly different. 
B-RAF deep sequence from whole exome sequence analysis. Genomic libraries 
were generated following the Agilent SureSelect Human All Exon Kit Illumina 
Paired-End Sequencing Library Prep Version 1.0.1 protocol at the UCLA Genome 
Center. Agilent SureSelect All Exon ICGC version was used for capturing 
~50 megabase (Mb) exome. The Genome Analyzer IIx (GAIIx) was run using 
standard manufacturer’s recommended protocols. Base-calling was done by 
Illumina RTA version 1.6.47. Two lanes of Illumina single end (SE) run were 
generated for each of Pt111-001 normal, baseline and DP2 samples, and one lane 
of Illumina paired end (PE) run was generated for each of Pt111-001 DP1, DP3 as 
well as Pt111-010 normal, baseline, DP1 and DP2 samples. Alignment was per- 
formed using the Novocraft Short Read Alignment Package version 2.06. Human 
Genome reference sequence (hg18, March 2006, build 36.1), downloaded from the 
UCSC genome database located at http://genome.ucsc.edu and mirrored locally, 
was indexed using novoindex program (-k 14 -s 3). Novoalign program was used 
to align each lane’s qseq.txt file to the reference genome. Base calibration option 
and adaptor stripping option for paired-end run were used and the output format 
was set to SAM. Using SAMtools (http://samtools.sourceforge.net/), the .sam files 
of each lane were converted to .bam files, sorted and merged for each sample and 
potential PCR duplicates were removed using Picard (http://picard.sourcefor- 
ge.net/). The .bam files were filtered for SNV calling and small INDEL calling to 
reduce the likelihood of using spuriously mis-mapped reads to call the variants. 
For the .bam file to call SNVs, the last 5 bases were trimmed and only the reads 
lacking indels were retained. For the .bam file to call small INDELs, only the reads 
containing one contiguous INDEL but not positioned at the beginning or the end 
of the read were retained. SOAP consensus-calling model implemented in 
SAMtools was used to call the variants, both SNVs and indels, and generate the 
pileup files for each .bam file. Coding regions + 2 bp of B-RAF gene were extracted 
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from the .pileup files and the reads were manually examined for rare variants (non 
reference alleles). 

Microarray data generation and analysis. Total RNAs were extracted using the 
RiboPure Kit (Ambion) from cells (DMSO or PLX4032, 1 uM, 6h). cDNAs were 
generated, fragmented, biotinylated, and hybridized to the GeneChip Human Gene 
1.0 ST Arrays (Affymetrix). The arrays were washed and stained on a GeneChip 
Fluidics Station 450 (Affymetrix); scanning was carried out with the GeneChip 
Scanner 3000 7G; and image analysis with the Affymetrix GeneChip Command 
Console Scan Control. Expression data were normalized, background-corrected, and 
summarized using the RMA algorithm implemented in the Affymetrix Expression 
ConsoleTM version 1.1. Data were log-transformed (base 2) for parametric analysis. 
Clustering was performed with MeV 4.4, using unsupervised hierarchical clustering 
analysis on the basis of Pearson correlation and complete/average linkage clustering. 
Differentially expressed genes were identified using significance analysis of micro- 
arrays (SAM) with the R package ‘samr’ (R 2.9.0; FDR < 0.05; fold change greater 
than 2). To identify and rank pathways enriched among differentially expressed 
genes, P-values (Fisher’s exact test) were calculated for gene sets with at least 20% 
differentially expressed genes. Curated gene sets of canonical pathways in the 
Molecular Signatures Database (MSigDB) were used. 
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Copy number variation analysis. []lumina HumanExon510S-DUO bead arrays 
(Illumina) were performed following the manufacturer’s protocol. Scanned array 
data were imported into BeadStudio software (Illumina), where signal intensities 
for samples were normalized against those for reference genotypes. Log, ratios 
were calculated, and data smoothed using the median with window size of 10 and 
step size of five probes. 

Cell cycle and apoptosis analysis. All infected cells were replenished with 
PLX4032 24h after infections (M229 R5 treated with AZD6244 to inhibit rebound 
p-ERK on PDGFR§ KD), fixed, permeabilized, and treated with RNase (Qiagen). 
Cells were stained with 50 mg ml ' propidium iodide (BD Pharmingen) and the 
distribution of cell cycle phases was determined by Cell Quest Pro and ModiFit 
software. For apoptosis, post-infection cells were stained with Annexin V-V450 
(BD Pharmingen) and propidium iodide for 15 min at room temperature. Flow 
cytometry data were analysed by the FACS Express V2 software. 

Image acquisition and data processing. Statistical analyses were performed 
using InStat 3 Version 3.0b (GraphPad Software), and graphical representations 
using DeltaGraph or Prism (Red Rock Software). An Optronics camera system 
was used in conjunction with Image-Pro Plus software (MediaCybernetics) and 
Adobe Photoshop 7.0. 
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The multi-subunit DNA-dependent RNA polymerase (RNAP) is 
the principal enzyme of transcription for gene expression. 
Transcription is regulated by various transcription factors. Gre 
factor homologue 1 (Gfh1), found in the Thermus genus, is a close 
homologue of the well-conserved bacterial transcription factor 
GreA, and inhibits transcription initiation and elongation by bind- 
ing directly to RNAP**. The structural basis of transcription 
inhibition by Gfh1 has remained elusive, although the crystal struc- 
tures of RNAP and Gfh1 have been determined separately*°. Here 
we report the crystal structure of Thermus thermophilus RNAP 
complexed with Gfh1. The amino-terminal coiled-coil domain of 
Gfh1 fully occludes the channel formed between the two central 
modules of RNAP; this channel would normally be used for nuc- 
leotide triphosphate (NTP) entry into the catalytic site. 
Furthermore, the tip of the coiled-coil domain occupies the NTP 
B-y phosphate-binding site. The NTP-entry channel is expanded, 
because the central modules are ‘ratcheted’ relative to each other by 
~7°, as compared with the previously reported elongation com- 
plexes. This ‘ratcheted state’ is an alternative structural state, 
defined by a newly acquired contact between the central modules. 
Therefore, the shape of Gfh1 is appropriate to maintain RNAP in 
the ratcheted state. Simultaneously, the ratcheting expands the 
nucleic-acid-binding channel, and kinks the bridge helix, which 
connects the central modules. Taken together, the present results 
reveal that Gfh1 inhibits transcription by preventing NTP binding 
and freezing RNAP in the alternative structural state. The ratcheted 
state might also be associated with other aspects of transcription, 
such as RNAP translocation and transcription termination. 

RNAP synthesizes RNA complementary to the template DNA (Sup- 
plementary Fig. 1a). Crystallographic studies of RNAPs from thermo- 
philic bacteria and RNAP II (Pol II) from the yeast Saccharomyces 
cerevisiae have revealed the overall structure of RNAP, which resembles 
a crab’s claw?’ (Supplementary Fig. 1b). In the transcribing RNAP 
(elongation complex, EC), the nascent RNA strand remains bound to 
the template DNA strand, forming an 8—9 base pair DNA*RNA hybrid. 
The DNAe*RNA hybrid and the downstream DNA duplex are tightly 
held in the ‘primary channel’ formed between the pincers of the crab 
claw, in the EC structures of the S. cerevisiae and T. thermophilus 
RNAPs'*”". The catalytic site of nucleotide addition resides at the joint 
of the pincers. The substrate NTP is considered to enter the catalytic site 
through a pore (the ‘secondary channel’) on the back side of the crab 
claw (Supplementary Fig. 1b). The two pincers are connected by a long 
a-helix (the bridge helix), which is located at the junction of the 
DNAs¢RNA hybrid-binding site, the downstream DNA-binding site, 
and the secondary channel. The bridge helix is inherently flexible, adopt- 
ing both the continuously-helical and kinked conformations”''”””’, 

In a previous study”, we successfully crystallized T. thermophilus 
RNAP together with DNA, RNA and Gfhl, and collected X-ray 
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diffraction data sets for two P2, crystals (crystals 1 and 2). The 
nucleic-acid scaffolds employed for the crystallization included the 
downstream duplex DNA, the DNA*RNA hybrid, and an upstream 
RNA hairpin 10 or 11 nucleotides (nt) from the RNA 3’ end (Sup- 
plementary Text 1, Supplementary Fig. 2a)”. In the present study, 
we determined the structures of the quaternary complex of 
RNAP*DNA*RNAeGfh1 (ECeGfh1) (Fig. la, b, Supplementary 
Table 1, and Methods). The structures of the three independent 
RNAP molecules in the asymmetric units of crystals 1 and 2 are all 
similar to each other (Supplementary Text 2, Supplementary Figs 
3—5). RNAP and Gfh1 showed clear electron densities (Supplemen- 
tary Fig. 6), and thus the inhibition mechanisms of Gfhl were un- 
ambiguously revealed, as described below. In contrast, the electron 
densities of both the DNA and RNA were weak, so we only built the 
partial models (Supplementary Text 3, Supplementary Figs 2, 7). 

The S. cerevisiae Pol II structure consists of four rigid modules, 
‘core’, ‘shelf, ‘clamp’ and ‘jaw-lobe’, which are mobile relative to each 
other’®. The rigid modules of the bacterial RNAP were defined previ- 
ously, on the basis of the structures of the Thermus aquaticus core 
enzyme and the T. thermophilus holoenzyme''**”° (Supplementary 
Text 4). However, the conformations of the T. thermophilus RNAP 
in the present ECeGfh1 structures differ appreciably from those in the 
previously reported structures of the core enzyme, the holoenzyme and 
ECs'*"* (Fig. 1c, d). The large conformational differences enabled us to 
redefine the rigid-body modules of T. thermophilus RNAP (Fig. 1 a, b, 
Supplementary Table 2, Supplementary Text 4, Supplementary Figs 8, 
9). These rigid modules include the ‘core’, ‘shelf, and ‘clamp’ modules, 
which generally correspond well to those in Pol II. The exceptions are 
that the active-site and dock domains belong to the ‘shelf module, 
rather than to the ‘core’ module, and the §-domain 1 (or the protrusion 
domain in Pol II) and the B flap domain (or the wall domain in Pol II) 
are not included in the ‘core’ module in T. thermophilus RNAP 
(Supplementary Text 4, Supplementary Fig. 8). The core and shelf 
modules are the main structural elements that form the primary, 
secondary and RNA-exit channels (Fig. 1a, b, Supplementary Text 4). 
Gfh1 is accommodated in the secondary channel and its exterior region 
(Fig. la, b, Supplementary Fig. 6). The N-terminal domain (NTD) of 
Gfh1 forms a coiled coil and is inserted into the secondary channel. The 
major relative movements among the rigid-body modules are the 
‘ratcheting’ between the shelf and core modules and the ‘swinging’ of 
the clamp relative to the shelf module (Fig. 1c, d, Supplementary Movies 
1, 2), as described below in more detail. 

The clamp module is connected to the shelf module by four loops 
(switches 1, 2, 4 and 5, Supplementary Table 2). The protruded clamp 
module swings relative to the shelf module by about 15° around the 
centre near switch 5, and is further tilted by about 5° in EC*Gfh1 (Fig. 1a, 
b, Supplementary Fig. 10, Supplementary Movie 2). Consequently, 
the region of the primary channel outside the hybrid-binding site is 
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Figure 1 | Structure of EC*Gfh1. a, b, Overall structure of T. thermophilus 
EC*Gfh1 in two orientations. c, d, Superposition of ECeGfh1 and EC (PDB 
205], yellow). The core modules of the two structures are superposed 
minimizing the root mean square deviation (RMSD) between Co atoms. Two 
orientations are shown. The same colour scheme is used in all figures (RNA, 


widened (Supplementary Text 5, Supplementary Fig. 11). Considering 
that Gfhl binding to RNAP occurs on the opposite side of the clamp 
module, and that Gfh1 does not directly contact the clamp module, the 
clamp swinging seems to be related to the hairpin structure in the RNA 
of the nucleic acid scaffolds (Supplementary Text 5). 

The central body of RNAP is composed of the core and shelf modules, 
which are rotated by ~7° relative to each other in the present RNAP 
structures, as compared with the previously-reported EC state’>'* 
(Figs 1c, d, 2a, and Supplementary Movie 1). We designate this novel 
RNAP state as the ‘ratcheted’ state, and the previous EC state as the ‘tight 
state. The shelf module is attached to the backboard part of the core 
module (Fig. 1a), through interfaces of about 3,800 A’ in the ratcheted 
state and about 3,600 A” in the tight state (Fig. 2b, c). The shelf-core 
interfaces in the two states are mostly overlapped (shown in green), but 
there are several contact points specific to either the ratcheted state or the 
tight state (shown in red and yellow, respectively). The central over- 
lapped area of the interfaces is mainly hydrophobic, and the rotation 
axis of the ratcheting runs along it (Fig. 2b, c). The ratcheting axis forms 
an angle of about 50° to the floor of the channels on the core module 
(Fig. 2a). The core and shelf modules are connected by three peptide 
segments, the bridge helix, the loop consisting of § Tyr 998—Met 1005 
(previously designated as ‘switch 3°)'°, and the loop consisting of p’ 
Ala 779-Ser 782 (designated hereafter as the ‘hinge loop’) (Fig. 2d, 
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orange; template DNA, blue; Gfh1, purple; shelf module, cyan; core module, 
grey; clamp, green; switches 1—5, brown; hinge loop, red; other domains, dark 
grey). The three regions of the bridge helix are coloured differently (N-terminal, 
dark pink; central, hot pink; C-terminal, violet). 


Supplementary Fig. 12). As the ratcheting axis runs close to B’ Pro 781 
(Escherichia coli 8’ Pro 502) within the hinge loop, the conformational 
change due to the ratcheting is negligibly small around this loop (Fig. 2e, 
f, Supplementary Fig. 12). By contrast, the bridge helix is much further 
from the ratcheting axis, as compared with the other two peptide seg- 
ments (Fig. 2b, c), and the conformational change is significant, as 
described below in more detail. 

On the other hand, the contact points specific to the ratcheted state 
are also distant from the ratcheting axis (Fig. 2b, c). In particular, an 
o-helix of the shelf module (B’ 685 —696) contacts a B-hairpin from one 
of the « subunits (wB 184—191) (Fig. 2g). This contact probably limits 
the further ratcheting of the shelf module. Therefore, the ratcheted state 
is mostly at one extremity in the structural spectrum of bacterial RNAP, 
whereas the tight state observed in previous ECs'*"* should be the other 
extremity. 

The shelf module ratcheting results in a composite movement that 
expands the hybrid-binding site and shifts the shelf module forward, 
relative to the core module (Fig. 2e, f, Supplementary Text 6, Supplemen- 
tary Fig. 13). The bridge helix exposes its central part (B’ 1084-1092), 
but buries its N-terminal part (B’ 1070-1083) and the carboxy- 
terminal part (B’ 1093-1102) within the core and shelf modules, respec- 
tively. As the two modules ratchet, the N- and C-terminal parts of the 
bridge helix shift relative to each other. Consequently, the conformation 
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of B’ Thr 1088-Gly 1092 in the central part dramatically changes from 
the continuous «-helix in the previous ECs, and the two discontinuous 
a-helices are connected by the two non-helical residues, Ser 1091- 
Gly 1092 (Fig. 3a). It is intriguing that mutations of these two residues 
reportedly affect the RNAP activity’’. On the other hand, the bridge 
helix is kinked in the RNAP structures of T. aquaticus core enzyme 
(o28B’@) and T. thermophilus holoenzyme (%BB'@o) without nucleic 
acids”''. However, the present kinked conformation is quite different 
from the previous ones (Supplementary Text 7 and Supplementary Fig. 
14). Here, the two o-helices that are directly connected to the trigger 
loop are packed against the bridge-helix C-terminal region, and are 
therefore shifted together with it. To avoid steric hindrance with the 
tips of these two helices, the residues B’ Thr 1088, Ala 1089, Asp 1090 
and Ser 1091 (E. coli B’ Thr 790, Ala791, Asn 792 and Ser 793) pro- 
trude into the DNA*RNA hybrid binding site (Fig. 3b, Supplementary 
Text 6, Supplementary Fig. 13). Therefore, the conformational change 
of the bridge helix is likely to occur synchronously with the transition 
to the ratcheted state (Supplementary Text 7, 8). The direct interaction 
between Gfhl and RNAP may affect the fine conformation of the 
bridge helix, by immobilizing its N-terminal region (see below). 
Within the secondary channel, the Gfhl NTD interacts tightly with 
parts of the shelf module (including the trigger loop) and the core 
module (including the secondary-channel coiled coil (B’ 958—1014) 
and the N-terminal part of the bridge helix), and fits particularly well 
with the narrowest region of the secondary channel (Fig. 3c, Supplemen- 
tary Text 9, Supplementary Fig. 15, Supplementary Movie 1). The inter- 
action between the Gfhl NTD and the N-terminal part of the bridge 
helix seems to maintain the straight conformation of the N-terminal 
part, and to define the kinking point of the bridge helix in EC*Gfh1. It 
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is impossible for the Gfhl NTD to bind to the secondary channel in the 
tight EC in the same manner, as the channel is too narrow (Fig. 3d). 
Leu 33 of the Gfhl NTD is located in the narrowest region of the 
secondary channel. A Gfhl mutant with Leu33 replaced by Trp 
(L33W) lacked transcription inhibition activity, probably because of 
an inability to bind (Fig. 3e, Supplementary Fig. 16). The bulky side 
chain of Trp seems to prevent the Gfhl NTD from penetrating into 
the channel. This observation also indicates that the secondary chan- 
nel cannot open beyond the width of the present ratcheted state. 
Consequently, Gfh1 just fits into the well-defined ratcheted state. 
Considering that Gfh1 cannot bind to RNAP in the tight state, because 
of steric hindrance, it is reasonable to postulate that Gfhl traps a 
dynamically occurring, ratcheted state of RNAP. To examine this 
possibility, we performed crosslinking experiments. The results showed 
that an artificial disulphide bond or photo-crosslink was formed at the 
ratcheted state-specific interface between the core and shelf modules, 
even in the absence of Gfhl (Supplementary Text 10, Supplementary 
Figs 17—19). Therefore, RNAP might spontaneously and dynamically 
alter its conformation, from the tight state to the ratcheted state, 
although the result could also be explained by more localized con- 
formational fluctuations. 

Owing to the interaction of the Gfhl NTD with the narrowest part of 
the secondary channel, NTP entry would be prevented. Moreover, the 
presence of the Gfhl NTD is compatible with the unfolded conforma- 
tion of the trigger loop, but not with the NTP-induced, folded con- 
formation’*** (Supplementary Fig. 20). The tip of the Gfhl NTD is 
located near the RNAP active site (Supplementary Text 11, Sup- 
plementary Fig. 21). The end of the tip loop occupies the binding site 
of the B-y phosphate groups in the NTP insertion step of the nucleotide 
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translocation state (PDB 205]) is superposed, and is coloured yellow. c, d, The 
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translocation state (PDB 2O5I) is superposed. The black arrow indicates the 
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addition reaction (Fig. 3f). Furthermore, Gfh1 seems to stabilize the 
kinked bridge helix (Fig. 3b). An antibiotic, streptolydigin, also reportedly 
inhibited transcription by immobilizing the bridge helix in a fixed con- 
formation’. Taken together, these observations provide the explanation 
for the inhibition of transcription elongation by Gfh1 (Supplementary 
Text 12, and Supplementary Figs 22, 23). 

On the other hand, the C-terminal domain (CTD) of Gfh1 is bound 
with the edge of the secondary-channel coiled coil of the RNAP 
(Fig. 1a, b, Supplementary Text 13, Supplementary Fig. 24). The inter- 
action involves the hydrophobic patch on the surface of the Gfhl CTD 
(Fig. 3g). Therefore, we prepared a Gfh1 protein with the M125E muta- 
tion within the hydrophobic patch, and observed a loss of inhibition 
activity (Fig. 3e). Therefore, the interaction between the hydrophobic 
patch of the Gfhl CTD and the secondary-channel coiled coil is 
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required for Gfh1 to bind to RNAP. Other Gre factors, such as GreA 
and GreB, share structural and sequence similarities with Gfh1 (refs 2-8). 
In particular, the presence of the hydrophobic patch on the CTD is well 
conserved. In fact, the M124E mutation of E. coli GreB (M125E of Gfh1) 
also reduces the transcript cleavage activity of GreB’. Therefore, the Gre 
factors seem to share a common interaction mode between the hydro- 
phobic patch and the secondary-channel coiled coil, and probably bind 
to the ratcheted EC in a similar manner to Gfh1 (Supplementary Text 13, 
14). Transcript cleavage stimulated by GreA and GreB would be per- 
formed in the ratcheted state. Although Pol II EC also changes its struc- 
ture upon TFIIS binding"*”*, the observed change is much smaller than 
the transition to the ratcheted state of T. thermophilus EC upon Gfh1 
binding (Supplementary Text 15). 

The conformational changes of RNAP observed in the present struc- 
ture, including the shelf module ratcheting and the clamp swinging, 
might have functional relevance to other stages of the transcription 
reaction, as the conformational changes should modulate the inter- 
actions of RNAP with nucleic acids. Therefore, we suggest that the 
conformational changes may play distinct roles in RNAP translocation 
and transcription termination (Supplementary Text 16, 17, Sup- 
plementary Figs 25—27). Experimental tests of these hypotheses will 
be required to assess the importance of these conformational changes 
in the absence of Gfhl1. 


METHODS SUMMARY 

Structure determination. The structure of crystal 1 was solved by molecular 
replacement, using the coordinates of RNAP in T. thermophilus EC (PDB 
2051)" as the search model. There are three RNAP molecules in the asymmetric 
unit, and each RNAP is bound with Gfhl. As the relative positions of the Gfh1 
NTD and CTD differ from those in free Gfh1 (PDB 2F23)’, they were separately 
placed in the electron density map. The model of the DNA*RNA hybrid was built 
in the extra electron density in the DNA*RNA hybrid channel. We further remodelled 
the coordinates of both the proteins and the nucleic acids with the program Coot”. 
Atomic positions and grouped B-factors were refined to 4.1 A, by using the CNS 
program”® (Supplementary Table 1). The refinement converged to R and Rice values 
of 26.2% and 31.8%, respectively; the latter was calculated from randomly-chosen 3% 
of reflections excluded from the refinement. The structure of crystal 2 was solved by 
molecular replacement, using the coordinates of the RNAP in crystal 1 as the search 
model. The models of Gfh1 and the DNA*RNA hybrid were built in the extra electron 
density. Refinement of the coordinates was performed to 4.3 A with CNS. The final R 
and Rgee values are 31.4% and 33.8%, respectively. 

Transcription inhibition analysis. We prepared two mutant Gfh1 proteins 
(L33W and M125E). The elongation complex was reconstituted by incubating T. 
thermophilus RNAP with a nucleic acid scaffold containing template DNA, non- 
template DNA, and RNA. The RNA was 5’-radiolabelled using T4 polynucleotide 
kinase and [y-**P]-ATP. The nucleotide addition reaction was performed by incu- 
bating the transcription elongation complex in the presence of Gfh1 (the wild type 
or one of the mutants). 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Structure determination. Crystallization and data collection were described 
previously~’. The data were reprocessed with the XDS program*’. The structure 
for crystal 1 was solved by molecular replacement with the program Phaser”, 
using the coordinates of the core enzyme portion of T. thermophilus EC (PDB 
2051)" as the search model. The asymmetric unit contains three RNAP molecules. 
Each RNAP was divided into 25—26 rigid bodies, and their positions were refined 
with the program CNS version 1.2°°”’. Several of the rigid bodies deviated sub- 
stantially from the electron density, and they were manually adjusted to the density 
with the program Coot”. For the tip portion of the B’ non-conserved domain 
(B’ NCD, B’ 132—454), the coordinates of the T. thermophilus holoenzyme (PDB 
3DXJ)™* were used. Several rounds of rigid body refinement and manual adjust- 
ment were performed. In each RNAP molecule, extra electron density, corres- 
ponding to Gfh1, was observed. The NTD and CTD coordinates of free Gfh1 (PDB 
2F23)’ were separately placed in the 2F, — F, electron density map. One of 
the RNAP molecules in the asymmetric unit exhibited extra electron density 
corresponding to the DNA*RNA hybrid in the DNA*RNA hybrid binding site, 
for which we built the hybrid model. The coordinates of the DNA*RNA hybrid in 
S. cerevisiae EC (PDB 2VUM)” were used as the starting model. The electron 
density for the nucleic acids in the other two RNAP complexes was weak, probably 
owing to low occupancy and/or high mobility, and therefore, we did not build their 
models. 

The structures of the N- and C-terminal parts of the bridge helix in EC*Gfh1 are 
similar to those in the previous EC (2051)"’, while the central part of the bridge helix 
in the present complex exhibits a conformational change, due to the ratcheting of 
the core and shelf modules. The region of B’1086-1090 assumes a helical, but 
slightly curved conformation, and the model was built by adjusting the correspond- 
ing region in the previous EC (2O05]) to the electron density. Most parts of the bridge 
helix (8’ 1070-1090 and ’ 1093-1102) maintained the helical conformation. 
B’ Ser 1091 and B’ Gly 1092 were placed to link the two discontinuous helices, while 
fitting their main chains into the electron density. For this rebuilding, the position of 
B’ Tyr 1093, which was identified by the electron density of its large side chain, was 
helpful (Fig. 3a). The coordinates of both the proteins and the nucleic acids were 
further refined with the program Coot. The atomic positions and the grouped 
B-factors were refined to 4.1 A, by using CNS with strong NCS restraints among 
the three complexes in the asymmetric unit (Supplementary Table 1). Refinement 
was monitored by Ree, calculated from 3% of the reflections that were excluded 
from the refinement. 

The structure for crystal 2 was solved by molecular replacement, using the 
coordinates of RNAP and Gfhl in crystal 1 as the search model. A model of 
the DNA*RNA hybrid was placed in each RNAP in the electron density map, 
and positional refinement of the coordinates was performed to 4.3 A with CNS. 
For the calculation of Ree, the same reflections as those chosen for crystal 1 were 
used. 

The rigid bodies used in the structural refinement allowed us to define the 
mobile modules of T. thermophilus RNAP. We first superposed the RNAPs in 
the present EC*Gfh1 and the previous EC (205I)"* by certain rigid bodies, and 
then inspected the rigid bodies that superposed well concurrently. The masses of 
the co-relocated rigid bodies were defined as modules. Finally, we confirmed that 
the defined mobile modules relocated separately by the RigidFinder program”. 
Disulphide-bonding assay for the ratcheted RNAP. We constructed a plasmid 
that allows co-expression of the «, B, B’ and @ subunits of T. thermophilus RNAP 
in E. coli, for the preparation of recombinant T. thermophilus RNAP (pRpoBCAZ, 
to be published elsewhere). The recombinant RNAPs (the wild type and the « 
Q188C-f’ D685C mutant) were expressed using this system. They were purified 
by the procedure used for the natural RNAP core enzyme from T. thermophilus 
cells**, except that the cell lysate was heat-treated at 70°C for 30 min, in order to 
denature most of the non-thermophilic E. coli proteins. Then, the RNAPs 
were fractionated by polyethyleneimine precipitation, followed by ammonium 
sulphate precipitation. The recombinant RNAPs were further purified by chro- 
matography on Q-Sepharose and Superdex pg200 columns (GE Healthcare 
Biosciences). 

For the disulphide-bonding analysis, the recombinant RNAP (the wild type or 
the «% Q188C-B’ D685C mutant) was dissolved in 75 mM Tris-HCl buffer (pH 8.1), 
containing 50mM KCl, 10mM MgCl, and 1mM DTT. Each of the RNAPs 
(0.2 uM) was incubated with 0.7uM of the nucleic acid scaffold (DNATS/ 
DNANT/RNAI14) or 10 uM of T. thermophilus Gfh1 for 30 min. Then, 2.5mM 
glutathione disulphide (GSSG) was added to each RNAP solution for the mild 
oxidation of Cys residues. The mixtures were analysed by SDS-PAGE, using 
sample buffer lacking a reducing agent. The formation of a disulphide bond 
between % C188 and 8’ C685 was confirmed by the appearance of an extra band 
with low mobility, which corresponded to the crosslinked « and f’ subunits 
(Supplementary Fig. 17). The sequences of the nucleic acids are as follows. The 
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RNA oligomer: RNA14, JUUUUGAGUCUGCGGCGAU. The DNA oligomers: 
DNATS, AACATACGGCTCGGACAGAGGTCCTGTCTGAATCGATATCGC 
CGC; DNANT, CGATTCAGACAGGACCTCTGTCCGAGCCGTATGTT. The 
nucleic acid scaffold was designed by modifying the previously reported EC14 
scaffold, which forms a stable elongation complex with T. thermophilus RNAP”. 
Photo-crosslinking assay of the ratcheted RNAP. p-Benzoyl-L-phenylalanine 
(pBpa) is a photo-crosslinker that can be position-specifically incorporated into 
a recombinant protein***’. The gene encoding a pBpa-specific variant of 
Methanococcus jannaschii tyrosyl-tRNA synthetase**, under the control of the 
E. coli tyrS promoter, was cloned in the pACYC184 vector, together with three 
copies of the M. jannaschii amber suppressor tRNA gene”, to create a vector for 
the expression of the pBpa-specific tRNA synthetase and tRNA (ppbpaRS- 
3MJR1). Each tRNA gene had an E. coli Ipp promoter and an rrnC terminator. 
The artificial operons for overproducing minor tRNA species, including the minor 
tRNA"®, were described previously’’, and were cloned in a kanamycin-resistant 
plasmid carrying the CloDF13-derived replication origin, to create pMINOR2. 

The rpoA gene, C-terminally tagged with FLAG, was engineered, using a 
QuikChange mutagenesis kit (Stratagene), to have an amber codon in place of 
Arg 185, for producing the RNAP o-subunit with Arg 185 replaced with pBpa 
(pBpa 185) (ref. 41). The rpoC gene was engineered to have a methionine codon 
in place of Glu 692, for producing the f’ subunit with the E692M substitution. The 
rpoA and rpoC genes in pRpoBCAZ were replaced by these mutant genes, and the 
vector was introduced into BL21 Star(DE3) cells (Invitrogen) harbouring the 
ppbpaRS-3MJR1 and pMINOR2 plasmids. The cells were grown in LB medium 
containing 1 mM pBpa, and the gene expression was induced by the addition of 
1mM IPTG at the mid-log phase. After a further 4-h incubation, the cells were 
harvested and lysed by sonication in buffer A (40 mM Tris-HCl (pH 7.7), 500 mM 
NaCl, 10mM EDTA, 10mM 2-mercaptoethanol, 5% glycerol, and Complete 
protease inhibitor cocktail tablets (Roche)). The wild-type and engineered 
RNAP core enzymes were roughly purified by heat-treatment. For the photo- 
crosslinking, the proteins were exposed to light at 365 nm for 30 min on ice*’ in 
a 24-well cell culture plate (BD Biosciences), followed by SDS-PAGE, Coomassie 
brilliant blue staining, and western blotting with an anti-FLAG antibody (Sigma) 
(Supplementary Fig. 19a, 19b). The RNAP core enzymes were further purified as 
described above. Then, 2 11M RNAP (wild type or mutant) was incubated with the 
nucleic acid scaffold (2.5 4M) or Gfh1 (10 j1M) for 30 min at room temperature in 
50 mM HEPES-NaOH buffer (pH 7.5), containing 50 mM KCl, 10 mM MgCl and 
1mM DTT, followed by the photo-crosslinking step (Supplementary Fig. 19c). 
Transcription inhibition analysis. In the previous study, we constructed a plasmid 
for the expression of wild-type Gfh1 in E. coli’. The expression plasmids for Gfh1 
variants (L33W and M125E) were generated by introducing mutations to the plasmid 
encoding wild-type Gfh1. In addition, we constructed plasmids for the expression 
of wild type and mutant T. thermophilus GreA in E. coli. The wild-type and mutant 
Gre proteins were expressed as described previously™, and were then purified by 
chromatography on Toyopearl Super-Q and Butyl columns (Tosoh Bioscience). 
The transcription elongation complex was reconstituted by incubating 0.1 1M of 
T. thermophilus RNAP with 0.1 [\M of the nucleic acid scaffold (DNATS/DNANT/ 
RNAI4 or RNAIS5) for 30 min, where the sequence of RNA15 is UUUUUG 
AGUCUGCGGCGAUA. The RNA was 5’-radiolabelled using T4 polynucleotide 
kinase and [y-*’P]-ATP. The nucleotide addition reaction was performed by 
incubating the transcription elongation complex of RNA14 with 5 uM Gfh1 (wild 
type or a mutant) at 20°C in 50mM MES-NaOH buffer (pH 6.5), containing 
50mM KCI, 10mM MgCl, 1 mM DTT and 20 1M ATP and that of RNA15 with 
2.5 uM Gfh1 (wild type or a mutant) at 55°C in 50 mM MES-NaOH buffer (pH 
6.5), containing 50 mM KCl, 1 mM MgCl, 1 mM DTT and 20 uM UTP. The RNA 
was analysed by denaturing (8 M urea) PAGE. 
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Single-molecule imaging reveals mechanisms of 
protein disruption by a DNA translocase 


Ilya J. Finkelstein’, Mari-Liis Visnapuu’ & Eric C. Greene? 


In physiological settings, nucleic-acid translocases must act on sub- 
strates occupied by other proteins, and an increasingly appreciated 
role of translocases is to catalyse protein displacement from RNA 
and DNA‘. However, little is known regarding the inevitable colli- 
sions that must occur, and the fate of protein obstacles and the 
mechanisms by which they are evicted from DNA remain un- 
explored. Here we sought to establish the mechanistic basis for 
protein displacement from DNA using RecBCD as a model system. 
Using nanofabricated curtains of DNA and multicolour single- 
molecule microscopy, we visualized collisions between a model 
translocase and different DNA-bound proteins in real time. We 
show that the DNA translocase RecBCD can disrupt core RNA 
polymerase, holoenzymes, stalled elongation complexes and tran- 
scribing RNA polymerases in either head-to-head or head-to-tail 
orientations, as well as EcoRT=!!!2, Jac repressor and even nucleo- 
somes. RecBCD did not pause during collisions and often pushed 
proteins thousands of base pairs before evicting them from DNA. 
We conclude that RecBCD overwhelms obstacles through direct 
transduction of chemomechanical force with no need for specific 
protein-protein interactions, and that proteins can be removed 
from DNA through active disruption mechanisms that act on a 
transition state intermediate as they are pushed from one non- 
specific site to the next. 

RecBCD is a heterotrimeric translocase involved in initiating homo- 
logous recombination and processing stalled replication forks°*. RecB is 
a 3'->5’ SFIA helicase and contains a nuclease domain for DNA 
processing, RecD is a 5’ — 3’ SF1B helicase and RecC holds the com- 
plex together and coordinates the response to cis-acting Chi (crossover 
hot-spot instigator) sequences (5’-dGCTGGTGG-3’). RecD is the lead 
motor before Chi, RecB is the lead motor after Chi and Chi recognition 
is accompanied by a reduced rate of translocation corresponding to the 
slower velocity of RecB”*. Chi prompts RecBCD to process DNA, yield- 
ing 3’ single-stranded DNA overhangs onto which RecA is loaded”*. 

We monitored RecBCD activity using total-internal-reflection fluor- 
escence microscopy and a DNA curtain assay that allows us to visualize 
hundreds of aligned molecules’ (Supplementary Fig. 1). When assayed 
on DNA curtains, RecBCD displayed rapid translocation (1,484 + 167 
base pairs per second (bp s_'), 37°C, 1 mM ATP, N = 100; Supplemen- 
tary Fig. 1b, c), high processivity (36,000 + 12,500 bp) and decreased 
velocity in response to Chi (549 + 155bp s ', 37°C, 1mM ATP, 
N= 100; Supplementary Fig. 1), in agreement with previous studies®”. 

Escherichia coli contains ~2,000 molecules of RNA polymerase 
(RNAP), and =65% of these are bound to the bacterial chromosome’’, 
making RNAP one of the most commonly encountered obstacles in 
physiological settings. RNAP is of special interest because it is a high- 
affinity DNA-binding protein (dissociation constant, Kg ~ 10 pM for 
APR and 100pM for AP,) and a powerful translocase capable of 
moving under an applied load of ~ 14-25 pN (ref. 11). RNAP survives 
encounters with replication forks'*"'* and stalls fork progression in 
head-on collisions'*’°, suggesting that RNAP is among the most for- 
midable roadblocks encountered in vivo. During replication restart, 
RecBCD translocates towards oriC; therefore, most collisions with 


RNAP will occur in a head-on orientation, suggesting that to survive 
these encounters RecBCD would need to exert more force than a 
replisome. 

We used quantum dots (QDs) to fluorescently label RNAP (Sup- 
plementary Information). The binding distribution of QD-RNAP 
holoenzyme overlapped with known promoters (Fig. 1a), promoter 
targeting was o”° dependent and promoter-bound holoenzymes were 
highly stable (t,;2 = 23.2 + 1.42 min (half-life), N = 58; Supplemen- 
tary Fig. 3a, b, c). Core QD-RNAP dissociated when challenged with 
heparin (t)2 =3.4+0.03s, N=150), whereas promoter-bound 
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Figure 1 | RecBCD removes RNAP from DNA. a, Distribution of QD-RNAP 
bound to A DNA. Locations of promoters are indicated; those facing left are 
shown in blue, those facing right are shown in red. The inset shows examples of 
YOYO-1-stained 1 DNA (green) bound by RNAP (magenta). The tethered end 
of the DNA is on the left, and the free end of the DNA is on the right. kb, 
kilobase. b, Kymograms of RecBCD colliding with RNAP core, holoenzyme, 
stalled elongation complex (EC), and stalled elongation complex chased with 
ribonucleoside triphosphate (rNTP). Gaps in magenta traces correspond to 
quantum dot blinking. In these and all subsequent kymograms, the tethered 
end of the DNA is at the top, the free end is at the bottom, and buffer flow is 
from top to bottom. c, Distribution of event types. d, Tracking data for 
collisions, with traces aligned at the collisions. 
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holoenzyme was heparin resistant (t)/. >> 6.7 min, N = 58), confirm- 
ing open complex formation (Supplementary Fig. 3c, d). Bulk assays 
verified that QD-RNAP produced transcripts (Supplementary Fig. 
3e), and single-molecule assays revealed a transcription velocity of 
15.7 = 8.6 bp st (N = 20, 25°C, 250 uM of each ribonucleoside tri- 
phosphate; Supplementary Fig. 3f). 

When RecBCD collided with RNAP, the polymerase was rapidly 
ejected from DNA (f/2 = 2.4 + 0.13 s; Fig. 1b). Remarkably, RNAP 
could be pushed long distances (10,460 + 7,690 bp, N = 44; Fig. 1 and 
Supplementary Fig. 4) and RecBCD could disrupt core RNAP, holoen- 
zymes, stalled elongation complexes and active elongation complexes 
(Fig. 1b and Supplementary Fig. 5). Out of 47 collisions with QD- 
RNAP holoenzyme, 15% (7 of 47) immediately stalled RecBCD, 8.5% 
(4 of 47) resulted in dissociation of RNAP with no sliding, 76.5% (36 of 
47) of RNAP was pushed and 71% of pushed molecules were even- 
tually ejected (Fig. 1c). The population of RNAP molecules that was 
directly ejected from the DNA increased ~5-fold for stalled and active 
elongation complexes (Supplementary Fig. 5). RecBCD also pushed 
and evicted RNAP labelled with 40-nm fluorescent beads or Alexa 
Fluor 488, arguing against nonspecific interactions between RecBCD 
and the quantum dots (Supplementary Fig. 6a). RecBCD did not slow 
or pause on colliding with RNAP (Fig. 1d and Supplementary Fig. 4a), 
nor was there any reduction in processivity in comparison with naked 
DNA (29,000 + 15,500 bp). Similar outcomes were observed before 
and after Chi (not shown), indicating that RecBCD could dislodge 
RNAP regardless of whether RecB or RecD was the lead motor. We 
could unambiguously assign the orientation of RNAP at APpr (Fig. la 
and Supplementary Fig. 3f), and RecBCD dislodged RNAP bound at 
APp during collisions in either direction (Fig. 1b and Supplementary 
Fig. 7a). RecBCD also pushed and ejected RNAP bound at all other 
locations regardless of DNA orientation (Fig. 1b). RecBCD even dis- 
lodged RNAP at lower velocities (446 + 192bps ',122+128bps ‘ 
and 78 + 27 bp s | at 100 uM, 25 uM and 15M ATP, respectively; 
see Supplementary Fig. 7 and below), indicating that proteins could be 
dislodged even under suboptimal translocation conditions. We con- 
clude that RecBCD disrupts RNAP regardless of orientation, tran- 
scriptional status or translocation velocity. 

We next asked whether RecBCD could dislodge other proteins. 
EcoRI*"!?° is a catalytically inactive version of EcoRI, which has high 
affinity (Kg = 2.5 fM) for cognate sites and even binds tightly to non- 
specific DNA” (Ka = 4.8 pM). EcoRI*!"'° can halt E. coli RNA poly- 
merase’*; T7 and SP6 RNA polymerases”; SV40 large T antigen; 
E. coli UvrD, DnaB and T4 Dda helicases; SV40 replication forks”'; 
and E. coli replication forks*. EcoRI withstands up to ~20-40 pN 
(ref. 22), and EcoRI"''!® binds cognate sites ~3000-fold stronger than 
wild-type EcoRI’ (Ka = 6.7 pM); thus, we infer that the catalytic 
mutant can resist at least as much force as the wild-type protein. lac 
repressor (LacI) is representative of a large family of bacterial tran- 
scription factors that has served as a model for transcriptional regu- 
lation and protein-DNA interactions. LacI binds tightly to specific 
sites (Kg = 10M for a 21-bp symmetric operator) but binds weakly 
to nonspecific DNA** (Kg = 1 nM) and slides rapidly along nonspecific 
DNA rather than remaining at fixed locations*”*. Lacl also blocks 
RNAP and replication forks both in vitro and in vivo'’, highlighting 
that it is a potent and physiologically relevant barrier to translocase 
progression. 

We labelled EcoRI’""'° and LacI with quantum dots (Supplemen- 
tary Information), and QD-EcoRI!!!2 and QD-Lacl bound to the 
correct locations on the DNA substrates, confirming that the tagged 
proteins retained normal DNA-binding activity (Fig. 2a and Sup- 
plementary Fig. 8). QD-LacI was rapidly released from DNA by 
isopropyl-B-D-thiogalactoside, as expected (Supplementary Fig. 9). 
When RecBCD collided with EcoRI*"'’, it pushed the proteins 
13,000 + 9,100 bp (N= 70) before ejecting them from the DNA 
(Fig. 2b, c). In contrast, Lacl was immediately ejected, and was not 
pushed within our resolution limits (Fig. 2b-d). There was no change 
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Figure 2 | Disruption of EcoRI®""!° and lac repressor by RecBCD. 


a, Histogram of EcoRE122 (upper panel, N = 1,481) and Lacl (lower panel, 
N= 700) bound to A DNA. The locations of the five EcoRI sites found in » 
DNA are indicated, along with examples of QD-EcoRI™!2 bound to YOYO- 
1-stained ) DNA (inset, upper panel) and examples of QD-Lacl bound to the 
DNA (inset, lower panel) b, Kymograms showing RecBCD colliding with 
EcoRT#!!!8 and Lacl (magenta), as indicated. d, Distribution of event types for 
EcoRT®!!!° and Lacl. e, Tracking data for individual collisions. 


in velocity or processivity upon colliding with either protein (Fig. 2b, d 
and Supplementary Fig. 4). Out of 70 collisions with QD-EcoRI"""°, 
11.2% (5 of 70) stalled the translocase, 11.4% (8 of 70) resulted in 
immediate dissociation of EcoRI®!"'2 with no detectable sliding, 
81.4% (57 of 70) of EcoRI®!!!2 was pushed along DNA and 92% of 
pushed molecules were eventually ejected (Fig. 2c). Out of 30 collisions 
with LacI, 3.3% (1 of 30) stalled the translocase, 93.3% (28 of 30) 
resulted in immediate dissociation of LacI with no detectable sliding 
and 3.3% (1 of 30) showed sliding before dissociation (Fig. 2c). A 
greater fraction of LacI might slide, but if so, the sliding events fall 
outside our resolution limits. Control experiments confirmed that 
RecBCD disrupted EcoRI"'!'® labelled with fluorescent beads or 
Alexa Fluor 488 (Supplementary Fig. 6b). As with RNAP, RecBCD 
could strip EcoRI®!!'® after Chi (not shown) and also disrupted 
EcoRI*!'2 and Lacl during low-velocity collisions (see below). 
These findings confirm that RecBCD readily displaces tightly bound 
proteins from DNA. 

In eukaryotes, nucleosomes are the most frequently encountered 
DNA-bound obstacles. Replisomes, transcription machinery and ATP- 
dependent chromatin remodellers all act through mechanisms requir- 
ing force generation, and the response of nucleosomes to these forces 
remains a long-standing question in chromatin biology. Heterologous 
systems have revealed fundamental principles underlying these pro- 
cesses’’**: experiments with SP6 RNAP provided a theoretical frame- 
work for nucleosome repositioning’, and studies with phage T4 
proteins were among the first to address the fate of nucleosomes during 
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replication**. Eukaryotic translocases exert forces in the same net 
direction as RecBCD, and RecBCD can unwind nucleosome-bound 
DNA”, arguing that it can serve as a good protein-based force probe 
for studying the fate of nucleosomes when rammed by a translocase. 

Recombinant nucleosomes were deposited on DNA curtains by salt 
dialysis, as described’. Remarkably, RecBCD could push nucleosomes 
(7,311 + 5,373 bp, N = 75; Fig. 3), and similar results were obtained 
with fluorescently labelled H2A-H2B dimer or H3-H4 tetramer 
(Fig. 3a). Control experiments demonstrated that RecBCD could also 
push nucleosomes labelled with either fluorescent beads or Alexa Fluor 
488 (Supplementary Fig. 6c). Out of 357 collisions with nucleosomes, 
24% (84 of 357) immediately stalled RecBCD, 11% (40 of 357) resulted 
in direct nucleosome ejection, 65% (233 of 357) led to sliding (Fig. 3b) 
and ~50% of these were eventually ejected (t,/2 = 3.93 + 0.21 s; Fig. 3b 
and Supplementary Fig. 4c). Nucleosomes reduced the processivity of 
RecBCD to 14,000 + 7,000 bp, as anticipated’’, and the translocase 
stalled in a larger fraction of these collisions (24%) than in collisions 
with RNAP (15%), EcoRI®!"2 (7%) and Lacl (3.3%). Relative to the 
other roadblock proteins, fewer of the pushed nucleosomes (50%) were 
subsequently ejected from the DNA, and there was a 10% reduction 
(t-test, P = 0.0005) in velocity while pushing nucleosomes (Fig. 3c and 
Supplementary Fig. 4c). These results demonstrate that intact nucleo- 
somes can be pushed along DNA as theoretically predicted*®, but 
indicated that RecBCD had more difficulty pushing and evicting 
nucleosomes than it did the other protein roadblocks. The finding that 
RecBCD pushes and evicts nucleosomes also rules out mechanisms 
requiring species-specific protein-protein interactions. 

Protein disruption mechanisms can be described by at least four 
models, which differ in the nature of the mobile intermediates and the 
stage of the chemomechanical cycle during which the proteins dissociate 
(Fig. 4a). In the first model, passive release, the proteins (S) are dislodged 
from a high-affinity specific site and then pushed from one sequential 
nonspecific site to the next. Subsequent dissociation occurs sponta- 
neously simply because the proteins are bound to lower-affinity non- 
specific DNA (N). This model assumes that the proteins have similar 
low affinities for all nonspecific sites sampled, and predicts that the 
observed rates of RecBCD-induced dissociation (kog¢ops) would be 
similar to that of spontaneous dissociation from nonspecific DNA in 
the absence of RecBCD (kote obs = Koten). This model also predicts that 
the distance (d) over which proteins are pushed will be dictated by their 
affinity for nonspecific DNA and will be proportional to velocity (V) 
such that faster translocation will lead to longer distances and slower 
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Figure 3 | Nucleosomes can be pushed along DNA. a, Kymograms showing 
RecBCD collisions with nucleosomes (magenta) that are labelled on either the 
H2A-H2B dimer or the H3-H4 tetramer, as indicated. b, Distribution of event 
types. c, Tracking data illustrating collisions between RecBCD and nucleosomes. 
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Figure 4 | Protein displacement mechanisms. a, Models for protein 
displacement (see main text). b, Spontaneous dissociation versus RecBCD- 
induced dissociation as determined from single exponential fits to dissociation 
data + s.d. Values for LacI nonspecific and RecBCD-induced dissociation 
represent upper bounds. Data are colour-coded for each different protein, as 
indicated. c, Representative sliding trajectories. Each line corresponds to the 
collision (right end point) and dissociation (left end point) for single proteins. 
d, RecBCD velocity (mean = s.d.) at different ATP concentrations ([ATP]). 
e, Protein 1,2 at different [ATP] values. Error was =5.6% of the reported values. 
f, Pushing distances (mean + s.e.m.) at different [ATP] values. 


translocation will yield shorter distances. The second model, preferred 
site release, accounts for a situation in which proteins encounter rare 
sequences of exceptionally low affinity (N’), such that they preferentially 
dissociate from these sites (Koen >> Kogen). In the third model, struc- 
tural disruption, translocase collisions alter the conformation of the 
proteins (for example by permanently rupturing a subset of protein- 
DNA contacts) such that they persist as structurally perturbed com- 
plexes (X) after displacement from the high-affinity site. In this case, the 
mobile intermediates have a characteristic lifetime (tx) dictated by their 
weakened affinity for DNA, and this lifetime should be insensitive to 
translocation velocity. Therefore, the distance (d) over which proteins 
are pushed will be proportional to velocity (V), and faster translocation 
will lead to longer distances whereas slower translocation will yield 
shorter distances. The most important feature of this model, which 
distinguishes it from all of the other models, is that the structurally 
disrupted proteins are more weakly bound to DNA specifically as a 
consequence of the collision, such that the observed rate of RecBCD- 
induced dissociation (Kosops) Would be greater than the rate of spon- 
taneous dissociation from nonspecific DNA (kotrobs ~ Kotex >> koten)- 
The fourth model, transition state ejection, is characterized by a series of 
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tightly bound nonspecific complexes (N) that must pass through a 
weakly bound transition state (T) as they are pushed from one position 
to the next. This model predicts that dissociation occurs predominantly 
during the transition state (Koger >> kogen). The time required to pass 
through the transition state during one round of the chemomechanical 
cycle is equivalent to the time required for the translocase to take a single 
step (Kstep), which is a fixed intrinsic value independent of ATP concen- 
tration. This relationship can be rationalized by considering that the 
velocity of RecBCD can be controlled by modulating ATP concentra- 
tion (see below), with slower velocities resulting from longer dwell times 
between steps (while awaiting new ATP) rather than from changes in 
kstep. Therefore, the time it takes the roadblock to pass through the 
transition state during a single step will be independent of ATP con- 
centration, whereas the cumulative time spent in the transition state will 
increase linearly with step number (m) irrespective of the overall 
observed translocation velocity. The probability of dissociation will then 
increase with step number, the observed lifetimes will be inversely pro- 
portional to velocity and the total distance the proteins are pushed 
before dissociation will be independent of velocity (that is, the road- 
blocks will be pushed similar distances regardless of how fast the trans- 
locase moves). 

Each aforementioned model makes distinct predictions that can be 
experimentally evaluated. This evaluation is easier for RNAP, EcoRTt!@ 
and nucleosomes because these proteins are pushed long distances (Lacl 
is considered separately below). We first measured dissociation of these 
proteins from specific and nonspecific sites in the absence of RecBCD 
(Supplementary Information), and compared these results to RecBCD- 
induced rates of dissociation (Fig. 4b). RNAP, EcoRE|! and nucleo- 
somes all bind tightly to nonspecific DNA, and RecBCD-induced 
dissociation was =200-fold faster than spontaneous dissociation from 
nonspecific sites, which is inconsistent with passive release. We next 
analysed pushing trajectories to determine whether there was any evid- 
ence supporting preferred site release. Comparison of these trajectories 
revealed that RecBCD-induced dissociation of all three roadblock 
proteins occurred at random locations (Fig. 4c), arguing against pre- 
ferred site release. To distinguish between structural disruption and 
transition state eviction, we compared protein lifetimes and pushing 
distances at four different translocation velocities (Fig. 4d). Remarkably, 
a 3.3-fold decrease in RecBCD velocity (446 + 192bps ‘ at 100 1M 
ATP) led to 1.5-, 7.0- and 3.4-fold increases in the post-collision half- 
lives of EcoRT®!!!2, RNAP and nucleosomes (Fig. 4e), respectively, 
although the distribution of distances over which the proteins were 
pushed remained largely unaltered (Fig. 4f and Supplementary 
Table 1). This effect was even more obvious at 15 1M ATP, where a 
19-fold decrease in RecBCD velocity (78 + 27 bp s_-) led to 36-, 93- 
and 24-fold increases in the post-collision half-lives of EcoRT# 2, 
RNAP and nucleosomes, respectively, but pushing distances were 
either unaltered or increased in comparison with those corresponding 
to the faster velocities. These results indicated that dissociation was 
dictated by the number of steps the proteins were forced to take rather 
than the cumulative time it took to be pushed a given distance, which is 
most consistent with transition state ejection. Although our experi- 
ments did not reveal any evidence for a structural disruption eviction 
mechanism, this does not rule out the possibility that EcoRT!!2, 
RNAP and nucleosomes are structurally altered when acted upon by 
RecBCD. However, if they are structurally perturbed, this alone does 
not result in their eventual dissociation from DNA. 

Lacl differs from the other roadblocks in that it was immediately 
evicted from DNA, and the RecBCD-induced dissociation rate was 
comparable to the rate of spontaneous dissociation from nonspecific 
sites (Fig. 4b), which would seem consistent with a passive-release 
model. However, with current resolution limits we cannot completely 
rule out other mechanisms, and future studies will be necessary to fully 
address this issue. Importantly, RNAP, EcoRT®"!2 and nucleosomes all 
bind tightly to nonspecific DNA, whereas LacI binds much more weakly 
to nonspecific sequences (Fig. 4b), suggesting that Lacl is released more 
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rapidly from DNA after the collisions due to its weaker affinity for 
nonspecific sites. This result demonstrates that the roadblock proteins 
and the nature of their interactions with nonspecific DNA are critical 
contributing factors to the outcome of the collisions. 

This leaves the question of how much force RecBCD exerts, and 
how much is sufficient to disrupt obstacles. Although our experiments 
do not yield a direct read-out of force, we can safely conclude that the 
force exerted by RecBCD is sufficient to displace RNAP, EcoRT®!!2 
LacI and nucleosomes from DNA. Our work has revealed unpreced- 
ented details of protein collisions on DNA and provides new insights 
into how translocases can disrupt nucleoprotein complexes. Given the 
flexibility of our experimental platform, we anticipate that these studies 
can be extended to other translocases and roadblock proteins, and it 
will be important to determine whether the mechanistic concepts 
developed here apply to different types of collision between proteins 
on DNA. 


METHODS SUMMARY 


We conducted total-internal-reflection fluorescence microscopy experiments ona 
home-built microscope using nanofabricated DNA curtains, as previously 
described’. For all initial experiments, and for all kymograms shown in the manu- 
script, we used YOYO-1 to stain the DNA. YOYO-1 does not affect the transloca- 
tion rate or processivity of RecBCD*, and it did not affect the binding distributions 
of RNAP, EcoRT#!!2 or nucleosomes (not shown). In the presence of YOYO-1, 
the roadblock proteins showed the same general response to collisions with 
RecBCD, with identical distributions of ejection, stalling and pushing (and push- 
ing velocities) seen with and without YOYO-1. However, the stain reduced the 
distance obstacles were pushed by 20-30%. Therefore, all sliding distances and 
half-lives reported here correspond to values measured in the absence of YOYO-1. 
Sliding distances are reported only for roadblock proteins that did not encounter 
any other quantum-dot-tagged proteins as they were pushed along the DNA. This 
ensures that each analysed collision/dissociation event involved only a single 
quantum-dot-tagged protein. Many reactions were observed in which multiple 
quantum-dot-tagged roadblock proteins were pushed into one another, but in 
these cases we could not determine the order in which each such protein was 
displaced from the DNA, and therefore could not measure sliding distances. To 
categorize the distributions of event type, we defined ‘sliding’ as the movement of 
any quantum-dot-tagged roadblock by more than 0.53 um (~1,950 bp); anything 
less than this was scored as a direct dissociation event. 
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The mechanism of sodium and substrate release from 
the binding pocket of vSGLT 


Akira Watanabe", Seungho Choe*, Vincent Chaptal!, John M. Rosenberg”, Ernest M. Wright’, Michael Grabe®? & Jeff Abramson! 


Membrane co-transport proteins that use a five-helix inverted 
repeat motif have recently emerged as one of the largest structural 
classes of secondary active transporters’. However, despite many 
structural advances there is no clear evidence of how ion and sub- 
strate transport are coupled. Here we report a comprehensive study 
of the sodium/galactose transporter from Vibrio parahaemolyticus 
(vSGLT), consisting of molecular dynamics simulations, biochemical 
characterization and a new crystal structure of the inward-open con- 
formation at a resolution of 2.7 A. Our data show that sodium exit 
causes a reorientation of transmembrane helix 1 that opens an inner 
gate required for substrate exit, and also triggers minor rigid-body 
movements in two sets of transmembrane helical bundles. This 
cascade of events, initiated by sodium release, ensures proper timing 
of ion and substrate release. Once set in motion, these molecular 
changes weaken substrate binding to the transporter and allow galac- 
tose readily to enter the intracellular space. Additionally, we identify 
an allosteric pathway between the sodium-binding sites, the unwound 
portion of transmembrane helix 1 and the substrate-binding site that 
is essential in the coupling of co-transport. 

Secondary active transporters harness the energy stored in electro- 
chemical gradients to drive the accumulation of specific solutes across 
cell membranes. This task is accomplished by the alternating-access 
mechanism, in which the substrate-binding site is first exposed to one 
side of the membrane and, on ion and substrate binding, a conforma- 
tional change exposes the transported solute to the opposite face, 
where it is released*. Sodium/glucose co-transporters are prototypes 
of secondary active transporters that drive the accumulation of sugars 
and other molecules into cells. These transporters have critical roles in 
human physiology, where mutations in their genes are responsible for 
severe congenital diseases* and are the molecular targets for drugs to 
treat diabetes and obesity’. 

There has been a recent surge of work on crystal structures*"' dis- 
playing the five-helix inverted repeat motif. These are referred to as the 
LeuT superfamily and include genetically diverse proteins that transport 
a wide range of substrates and differ in the number and type of driving 
ligand'*"*. A general model for alternating access is being pieced together 
through comparisons of these diverse structures'*'’. Despite sharing a 
common set of ten core transmembrane segments, the lack of sequence 
similarity and the chemical diversity of the transported substrates pre- 
vents the complete understanding of the mechanistic basis of transport. 
This hurdle is being surmounted as multiple structures of the same 
protein—at different stages in the transport cycle—are solved, providing 
a comprehensive understanding of substrate binding”’*"* and the trans- 
ition from outward- to inward-facing conformations'’. However, an 
atomic-level understanding of sodium-coupled substrate co-transport, 
necessary to explain the dynamics of alternating access, is still absent. 

To investigate the mechanism of sodium-sugar coupling, we carried 
out molecular dynamics simulations on the galactose-bound inward- 
occluded conformation of vVSGLT° embedded in a lipid bilayer’*. All 
sodium co-transporters of the LeuT superfamily share a common 


sodium-binding site termed the Na2 site. During the transition to 
the inward-facing conformation, transmembrane helix (TM) 8, which 
forms part of the sodium-binding site, is displaced by ~4 A, generating 
a less favourable Na2 site that facilitates Na* release!” (Fig. 1b). Na™ 
modelled at this site is loosely coordinated by the carbonyl oxygens of 
Tle 65 (3.3 A), Ala 361 (3.2 A) and the side-chain hydroxyl of Ser 365 
(3.1 A). The carbonyl oxygen of Ala62 (3.6 A) and the side-chain 
hydroxyl of Ser 364 (3.6A) are also in close proximity (Fig. 1b). 
Previous molecular dynamics simulations performed on vSGLT” 
and Mhp1” indicated that Na* quickly leaves the Na2 site. Our simu- 
lations indicate that Na* exits the Na2 site after 9 ns (Fig. 2a) and 
interacts with the hydrophilic pore-lining residue Asp 189 on TM5 
during exit. The importance of Asp 189 was highlighted in a previous 
simulation’? and in biochemical studies on hSGLT1”. All three 
molecular dynamics simulations indicate that Na* exits the trans- 
porter before substrate exit; however, additional conformational 
changes are required to release the occluded galactose. 

In the inward-occluded structure, galactose is located halfway across 
the membrane (Figs 1a and 2b), where it is coordinated by extensive 
side-chain interactions from 'TM1, TM2, TM6, TM7 and TM10. Subsets 
of these residues form two hydrophobic gates blocking galactose exit to 
the intracellular and extracellular spaces. Our molecular dynamics 
simulations show that as Na* exits the Na2 site, galactose undergoes 
significant fluctuations within the binding pocket. At 52 ns, Tyr 263 
adopts a new and stable rotamer conformation that expands the exit 
pathway (between 52 and 110ns), permitting the sugar to leave the 
binding site (Figs 2 and 3). After sugar release (~110ns), Tyr 263 
returns to the original conformation. 

To test the hypothesis that Na* release stimulates an alternative con- 
formation of Tyr 263, we conducted a 200-ns molecular dynamics simu- 
lation in which the sodium was lightly restrained in the Na2 site. Under 
these conditions, Tyr 263 never adopts the alternative conformation, and 
thereby prevents galactose exit (Supplementary Fig. 1). This observation 
suggests that sodium release drives conformational changes that disrupt 
the galactose-binding site, and further suggests that interactions between 
the Na2 site and Tyr 263 are central to the transport mechanism. 

The spontaneous release of galactose in the absence of applied forces 
makes possible the accurate determination of the binding free energy 
profile through the use of umbrella sampling along the exit pathway 
coupled with weighted histogram analysis” (Fig. 3, inset). After Na* 
release, galactose is weakly bound to vSGLT with a minimal energy 
barrier of ~2 kcal mol ', resulting from the interaction of the sugar 
with residues Asn 64, Ser 66, Glu68 and Gln 69 on TM1. Asn 64 is of 
particular interest because it is located in the unwound segment of 
TM1 and has hydrogen bonds with the inner gate residue Tyr 263 and 
the O2 hydroxyl of galactose linking the Na2 site with the galactose 
site. Thus, the interactions of Asn 64 with Tyr 263 and galactose may 
be critical to the transport mechanism’. 

To test the importance of these interactions, we performed molecu- 
lar dynamics simulations and sodium-dependent transport assays on 


1Department of Physiology, University of California, Los Angeles, Los Angeles, California 90095-1759, USA. *Department of Biological Sciences, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, 
USA. ?Department of Computational and Systems Biology, University of Pittsburgh, Pittsburgh, Pennsylvania 15260, USA. 


*These authors contributed equally to this work. 


988 | NATURE | VOL 468 | 16 DECEMBER 2010 


©2010 Macmillan Publishers Limited. All rights reserved 


A361 


i 


Figure 1 | Structures and overlay of the inward-open and inward-occluded 
conformations. a, The core domain of the inward-open conformation (TM1- 
TM10) is coloured by specific helix bundles involved in the transition from the 
inward-occluded to the inward-open conformation. The ‘hash motif formed 
from TM3, TM4, TM8 and TM9 is blue; the ‘sugar bundle’ formed from TM2, 
TM6 and TM7 is green; TM1 is red; and TM5 and TM10 are magenta. The 
periphery helices (TM—1, TM11, TM12 and TM13) are yellow. Atoms are 
displayed in ball-and-stick form with oxygen coloured red and nitrogen 
coloured blue. Inset, an overlay of the inward-open (colour) and inward- 
occluded (grey) conformations illustrating the coordination at the Na2 and 
galactose-binding sites. b, c, Overlay of the inward-open and inward-occluded 
conformations with the same colouring as in a. Conformational changes in the 
inward-open structure reveals a ~13° kink in the unwound segment of TM1 
that prevents sodium coordination at the Na2 site (b). In the absence of 
galactose, the galactose-binding residue Asn 64 hydrogen-bonds to Glu 88 and 
Tyr 263, maintaining an open pathway from the intracellular space to the 
substrate-binding site (c). 


transporters with mutations at positions 64 and 263. Simulations of the 
Asn 64 Ala mutant show a momentary sodium departure from the Na2 
site at 5 ns, but the ion rapidly returns and remains for the remainder of 
the simulation. The failure of Na* to unbind prevents conformational 
changes in the unwound segment of TM1, and Tyr 263 remains in the 
blocked orientation (Supplementary Fig. 2a). In agreement with the 
simulation, sodium-dependent transport assays on the Asn 64 Ala 
mutant show no activity (Fig. 2c). 

To explore the role of Asn 64 further, we tested Asn 64 Gln and 
Asn 64 Ser, which, in principal, are both capable of maintaining the 
native hydrogen bonds to Tyr 263 and galactose. Models of Asn 64 Gln 
prevented simulation as the result of substantial steric clashes, which 
correlated well with a lack of transport (Fig. 2c). The model of the 
Asn 64 Ser mutation could form a hydrogen bond with Tyr 263 (3.3 A) 
but not with the O2 hydroxyl of galactose (4.3 A). In the simulation, 
Asn 64 Ser releases Na’ from the Na2 site at 25 ns, and Tyr 263 tran- 
siently adopts the alternative rotamer conformation before returning 
to its original position, preventing galactose exit (Supplementary Fig. 
2c). Similarly, simulation of the Tyr 263 Phe mutant shows that Na‘ 
unbinds at 10 ns, but unlike tyrosine, phenylalanine never adopts a 
conformation compatible with galactose exit (Supplementary Fig. 2b). 
Longer simulations may reveal galactose release in these mutants, 
because they both show modest transport activity (~ 10% of wild type); 
however, the transport assays and simulation data demonstrate that 
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Figure 2 | Mechanism of galactose release. a, Sodium and galactose exit 
vSGLT. The root mean squared deviation (r.m.s.d.) of Na* (green) rapidly 
increases at 9 ns, indicating exit from the Na2 site. This is followed by the release 
of galactose (red) at 110 ns. b, Tyr 263 adopts two rotamers. On the left, Tyr 263 
is shown in the conformation observed in the inward-occluded structure’, in 
which it blocks substrate exit through a hydrogen bond with Asn 64 on TM1. At 
52 ns (shown on the right), Tyr 263 adopts a rotamer conformation that 
expands the exit pathway. c, D-galactose uptake by wild-type and vSGLT 
mutants in proteoliposomes. Results are expressed as percentage uptake in 
either 100 mM NaCl or KCl, and show that the mutants Asn 64 Ala, Asn 64 Ser, 
Asn 64 Gln and Tyr 263 Phe severely impair sodium-dependent transport. 
Error bars, s.e.m. WT, wild type. 


robust transport requires precise orientation of Asn 64 to stabilize 
galactose and the gating residue Tyr 263. 

Although the molecular dynamics simulations and biochemical 
studies demonstrate a physical link between the Na2 site and the 
substrate, global details regarding the inward-open conformation 
(devoid of both ligands) remain elusive. To address this issue, we 
determined the structure of vVSGLT in the inward-open conformation. 
Crystals, in the absence of ligands, for both the wild-type protein and 
the inactive Lys 294 Ala mutant® were obtained. Both crystals had the 
same overall configuration, but the mutant crystals diffracted to a 
higher resolution (2.7 A; see Methods). 

As in the original structure’, the inward-open conformation is com- 
posed of 14 transmembrane helices, ten of which comprise the core 
domain. TM1-TM5 and TM6-TM10 are related by an approximate 

two-fold symmetry axis through the centre of the membrane plane. 
The inward-occluded and the inward-open structures have a similar 
overall fold with a r.m.s.d. of 1.2 A. However, there are distinct struc- 
tural differences between the two conformations, presumably owing to 
changes resulting from the release of ligands (Fig. 4). With the excep- 
tion of TM1, superimpositions of individual helices reveal that the 
occluded-to-open transition occurs by rigid-body movements of sub- 
domains (Supplementary Figs 3 and 4). Consistent with the recent 
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Figure 3 | The potential of mean force for galactose unbinding. Energy of 
galactose binding to vSGLT in the absence of Na*. Umbrella sampling along 
the natural, equilibrium pathway shown (inset) was used to determine the 
binding free energy. The distance along the pathway from the binding site in the 
X-ray structure is shown along the x axis. The coloured arrows correspond to 
the galactose positions shown in the inset. The largest barrier is ~2 kcal mol’, 
at 5 A, which corresponds to galactose interaction with residues in the kink 
region of TM1. Error bars were determined by splitting the production data 
into four equal sets, computing the energy profile for each set, and then 
applying a global shift to each curve before calculating the standard deviation at 
the 16 positions marked with points. 


assignment for Mhp1", the hash motifs, formed from TM3 and TM4 
and their inverted repeat equivalents, TM8 and TM9, align with a 
r.m.s.d. of 0.9 A. TM2, TM6 and TM7 form a domain termed the sugar 
bundle for the extensive side-chain interactions with galactose, and 
these regions superimpose with a r.m.s.d. of 0.5 A. This new inward- 
open structure of vSGLT is more similar to the recent inward-facing 
conformation structure of Mhp1 than is the previous structure of 
vSGLT. Details of this structural analysis are in Methods. 

The transition from the inward-occluded to the inward-open struc- 
ture is presumably triggered by sodium release from the Na2 site and 
the alteration in the hydrogen-bonding network surrounding the 
unwound segment of TM1. In particular, the intracellular half of 
TMI flexes ~13°, modifying the coordination of Asn 64 (Figs 1b 
and 4a). In the absence of both galactose and Na", Asn 64 coordinates 
Tyr 263 and Glu 88. Glu 88 was previously hydrogen-bonded to the O2 
and O3 hydroxyls of galactose. This new conformation of TM1 is 
further stabilized by hydrogen bonds between the Na2-site residue 
Ser 365 and Glu68 on the unwound segment of TM1 (Supplemen- 
tary Fig. 5). When viewed from the intracellular side, each domain 
moves ~3° in opposite directions, thereby increasing the volume of the 
accessibility cavity by ~1,400 A® (Fig. 4). This 6° relative rotation 
probably disrupts protein-substrate coordination and permits water 
to enter the site. This interpretation is supported by our simulations, in 
which an increase in the number of water molecules in the substrate- 
binding site is observed after sodium release (Supplementary Fig. 6). 
Water effectively competes with the protein for hydrogen bonds, loos- 
ening galactose in the pocket and ultimately assisting in its release 
(Supplementary Fig. 6 and Supplementary Movie 1). 

We propose the following mechanism for sodium and galactose exit 
from vSGLT. The transition from the outward- to the inward- 
occluded conformation weakens the Na2 sodium-binding site, causing 
it to become metastable and release the ion on a short timescale. Upon 
exit, the carbonyl oxygens of the ion-coordinating residues Ile 65 and 
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Figure 4 | Conformational changes in the transition from the inward- 
occluded to the inward-open structure. a, TM1 superimposed between the 
inward-open (red) and inward-occluded (grey) structures, showing a ~13° 
kink in TM1. b, Overlay of the inward-open (coloured as in Fig. 1) and inward- 
occluded (grey) conformations. Rigid-body rotations of the hash motif and 
sugar bundle by 3° in opposite directions expose the substrate-binding site to 
the intracellular environment. c, Accessibility cavity of the inward-occluded 
conformation is coloured blue. d, Accessibility cavity of the inward-open 
conformation is coloured gold. The conformational changes from TM1, hash 
motif and sugar bundle cause an increase of ~ 1,400 A? in the accessible volume 
of the inward-open conformation, aiding galactose release. 


Ala 62 undergo a conformational change in the unwound segment of 
TM1, producing a kink of ~13° (Figs 1 and 4a). Our simulation shows 
that movement of TM1 disrupts the hydrogen bond between Asn 64 
and Tyr 263, allowing the side chain of Tyr 263 to adopt a new con- 
formation that opens a pathway to the intracellular space (Fig. 2b). 
Additional rigid-body movements widen the intracellular cavity, 
allowing water penetration and further disrupting the substrate-binding 
site to enhance exit and prevent rebinding (Fig. 4). 

It is likely that the reaction scheme described here for vSGLT is 
broadly used by all sodium-dependent members of the LeuT super- 
family, because the Na2 site, the hydrophobic gates and the unwound 
segments on TM1 and TM6 are all conserved""'”*”’, For proteins with 
a single sodium-binding site, the Na2 site directly interacts with the 
substrate through polar residues—Asn 64 in vSGLT and Gln 42 in 
Mhp1—located on the unwound segment of TM1. For proteins that 
harbour two sodium-binding sites, such as LeuT and, putatively, BetP, 
the additional site (the Nal site) is positioned on the opposite side of 
the unwound helix from the Na2 site. Interactions between the Nal 
and Na2 sites are mediated by the unwound segment of TM1, and the 
sodium at the Nal site is directly coordinated to the substrate*””. 
Regardless of whether the protein has one or two sodium-binding sites, 
it is the conserved Naz? site, positioned most distal from the core of the 
protein, that regulates sodium and substrate release. This primary 
structural feature coupling sodium and substrate co-transport has 
fundamental implications for our understanding of membrane protein 
biology and for developing strategies to manipulate the alternating- 
access mechanism therapeutically. 
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METHODS SUMMARY 

Molecular dynamics simulations. The VsGLT monomer (Protein Data Bank ID, 
3DH4) was embedded and solvated in a 1-palmitoyl-2-oleoyl phosphatidylcholine 
membrane bilayer using the OPM* and CHARMM-GUI" software packages. 
Simulations were carried out using NAMD* with the CHARMM27 parameter 
set in a 150mM NaCl bath. See Methods for more details. 

Protein expression and purification. Plasmids carrying wild-type or mutant 
transporters were transformed and overexpressed in the TOP 10 Escherichia coli cell 
line. Cell membranes were isolated, solubilized (2% w/v decyl-B-b-maltopyranoside) 
and tandem-purified using a Ni-NTA Superflow column (affinity chromatography) 
and a Superdex 200 column (size-exclusion chromatography). See Methods for 
more details. 

Transport assays. We generated proteoliposomes by reconstituting purified 
vSGLT protein with sonicated lipid at a protein/lipid ratio of 1:200. We measured 
transport activity by monitoring the uptake of p-galactose, with '*C-p-galactose 
tracer, into proteoliposomes in the presence or absence of a 100 mM Na* gradient 
(Kt replacing Na‘). See Methods for more details. 

Crystallization and data collection. We concentrated purified wild-type and 
Lys 294 Ala protein to ~13mgml' and grew crystals by the hanging-drop 
vapour diffusion method using the Mosquito nanolitre-dispensing robot. Data 
collected at the Advanced Light Source, Berkeley (beamline 5.0.2), were integrated 
and scaled, and phases were calculated by molecular replacement. The model was 
built and refined to an Ryort/Réree Value of 25.1/27.4. See Methods for more details. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Molecular dynamics simulations. Initially, TM—1 was removed and six missing 
residues in the TM4-TM5 loop were added with the loop modelling routine in 
Modeller’’. Residues 53-547 were then embedded in a membrane and solvated in 
a hexagonal box approximately 96 X 96  84.A° in volume for a total of 63,000 
atoms. Electroneutrality was enforced with the addition of 150 mM NaCl. 

Simulations were carried out with the CMAP corrected*” CHARMM27 para- 
meter set and the TIP3P water model. VMD** and MATLAB were used for 
visualization and analysis. The system was minimized using conjugate gradient 
minimization and heated to 310K using Langevin dynamics with a 10-ps ' 
damping coefficient. An initial 300-ps equilibration using the NVT ensemble 
was carried out in which water, galactose, Na* and heavy backbone and side-chain 
atoms were constrained in a harmonic potential with a force constant of 
k=10.0kcalmol~!A~*. We then switched to an NPT ensemble, and the 
restraints on the water molecules and heavy side-chain atoms were gradually 
removed in five steps over 1.5 ns. All remaining restraints were removed in six 
steps over the next 1.8 ns. Finally, 10 ns of restraint-free simulation was run. All 
production runs start from this equilibrated system. A Langevin piston with a 200-fs 
period and 100-fs decay was used to set the pressure to 1 atm. Hydrogen bond 
lengths were constrained with SHAKE”, and a 2-fs time step was used. A 10 A van 
der Waals cut-off was used along with the particle mesh Ewald method for the 
electrostatics. 

Simulations with the restrained ion were carried out by holding the Na‘ ina weak 
spherical harmonic potential using the distance from the Na to the centre of mass 
(COM) of the of the five coordinating residues. The equilibrium distance, based on the 
inward-occluded structure, was 1.39 A, anda force constant ofk = 0.5 kcal mol! A~? 
produced minimal distortions in the protein. A 10-ns equilibration was run before a 
200-ns production run. The r.m.s.d. of the unrestrained simulation was 2.5 A for the 
entire protein and 3.0 A for the restrained-Na* simulation. 

Potential of mean force calculation. The potential of mean force (PMF) was 
calculated using umbrella sampling with WHAM”. We extracted 69 snapshots 
along the pathway and held each configuration in a harmonic potential 
(k =7.0kcal mol’ A~*) with a resting length equal to the z component of the 
distance between the galactose COM and the binding-site COM defined by the 
binding-site residues. Two nanosecond trajectories were run for each umbrella, 
and the last 1,800 ps were used for calculating the PMF. Splitting the trajectories 
into two equal parts (200-1,000 ps and 1,200-2,000 ps) and computing separate 
PMFs revealed that the total PMF is well converged. 

Protein purification. vSGLT proteins were cloned, expressed and purified as 
previously described****. Briefly, the plasmids were transformed into the 
TOP10 cell line expressed to ODgoo 1.8 and induced with 0.66 mM L-arabinose 
for 4h at 29°C. Cell membranes were isolated, solubilized with 2% decyl-B-p- 
maltopyranoside and affinity-purified on a Ni-NTA column. The sample was 
further purified by size-exclusion chromatography (Superdex 200) and washed 
with crystal buffer (20mM Tris (pH 7.5), 25mM NaCl, 0.174% decyl-B-p- 
maltopyranoside) in a 50-kDa Amicon filter unit. 

Transport assays. Mutants were created with the QuikChange method and puri- 
fied as above. vSGLT protein was reconstituted in 150mM KCl, 10mM Tris/ 
Hepes (pH 8.0), 1mM DTT, 1mM Na,EDTA, 1mM CaCl, 1mM MgCl, and 
0.5% decyl-B-p-maltopyranoside, with 1.2 mg ml! sonicated lipid (90 mg asolectin 
soy lecithin, 10 mg cholesterol) at a protein/lipid ratio of 1:200. Addition of 5 mg ml 
SM-2 Bio-Beads initiated the reconstitution and the mixture was incubated overnight 
at 4°C. The proteoliposomes were collected and washed twice by centrifugation. 
Pelleted proteoliposomes were resuspended and underwent three freeze-thaw cycles 
in liquid nitrogen. 

Uptake of p-galactose (88 1M) with 'C-p-galactose tracer into proteolipo- 
somes was measured for 18 min at 22°C in the presence or absence of 100 mM 
Na* (K* replacing Na“) as described previously®*°. Proteoliposomes were col- 
lected by filtration through 0.45-p1m Millipore filters and the uptake was quantified 
by scintillation counting. Results are expressed as the mean + s.e.m. of three 
determinations and three trials. 

Crystallization. Protein was concentrated to ~13mgml * before plating. 
Optimization by additive screening gave the best diffracting crystals with a reservoir 
solution containing 0.1M MES (pH 6.5), 4% MPD and 9-13% PEG400, and 
tridecyl-f-p-maltopyranoside to a final concentration of 0.0017% as an additive. 


Before freezing, crystals were cryoprotected using a solution containing 30% 
PEG400 and 0.174% decyl-f-b-maltopyranoside. 

Data processing, phasing and refinement. Data was collected at 1.0 A on cryo- 
cooled crystal (100 K) at the Advanced Light Source (beamline 5.0.2). Five data sets 
were integrated using HKL2000* and merged and subjected to B-factor-sharpenin: 
using an anisotropy correction server™* (resolution cut-offs: a= 3.1A, b=2.7A 
and c= 2.8 A). Phases were calculated by molecular replacement (PHASER”*’) 
using the original vSGLT structure as a search model. The model was built in 
COOT”* and refined using PHENIX”’ and BUSTER® using non-crystallographic 
symmetry (NCS) and TLS refinement restraints. There are two molecules per 
asymmetric unit with the A molecule displaying sharper electron density and lower 
B factors (88.5 A”) than the B molecule (131.3 A”). The model was built and refined 
to an Ryor/Rfree Value of 25.1/27.4. The Ramachandran statistics shown areas 
follows: 95.5% of the residues lie in the preferred region, 4.3% lie in the allowed 
region and 0.2% are outliers. 

The 2F,-F. maps contained three elongated features having a maximal peak 
height of 30. These attributes were interpreted and assigned as PEG molecules. 
Two are located at the periphery, whereas the third is near the Na2 site as observed 
in the Mhp]1 structure’ and is proposed to stabilize the inward-facing conformation. 

The Lys 294 Ala protein crystals diffract to higher resolution than the wild-type 
crystals. Data from four wild-type crystals were collected and merged to achieve a 
3.7 A resolution data set. Difference Fourier maps were calculated against the final 
Lys294Ala mutant model and no significant peaks were observed. The 
Lys 294 Ala model was further refined using PHENIX to yield an Rwor/Rfree Value 
of 30.7/34.8. 

We note that although refinement was carried out with data subject to aniso- 
tropic correction, as described above, the deposited data has not been treated. 
Figures were created from the A-chain protomer using PYMOL”. 

Structural comparison of vVSGLT with Mhp1. Superpositions of the inward- 
occluded (Protein Data Bank ID, 3DH4) and inward-open conformations of 
vSGLT with the inward-facing conformation of Mhp1 (Protein Data Bank ID, 
2X79) reveal they all share a similar global fold. The largest differences are centred 
near the substrate- and ion-binding sites. The Na2-site helices (TM1, TM5 and 
TM8) of the inward-open conformation of vSGLT have a closer fit to Mhp1 
(r.m.s.d., 2.2 A) than the inward-occluded conformation (r.m.s.d., 2.6 A); thus, 
the inward-open vSGLT structure more closely resembles the structure of Mhp1. 
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FRONTIERS OF CHEMISTRY RESOURCES SHOW GUIDE HELP BRIEFCASE 


Networking websites offer an increasing number of features, including virtual conferences, to entice researchers. 


COLLABORATION 


Social networking seeks critical mass 


There are myriad social and professional networking options for scientists. But, so far, none 
has proved infectious enough to go viral. 


BY VIRGINIA GEWIN 


( ‘nas Facebook’s phenomenal success 
has not proved easy for scientists eager 
to develop a network focused on shared 

research interests. The number of scientific 

networking websites is growing, but none has 
emerged as a ‘go-to’ system. 

The indifference could stem from several fac- 
tors: lack of confidence in individual networks; 
concerns that personal data might be sold on; or 
the fact that no one site provides tools or features 
valuable enough to lure a majority of busy scien- 
tists. Despite the proliferation of networks, a big 
question remains, says Laura James, chief oper- 
ating officer at the Centre for Applied Research 
in Educational Technologies at the University of 
Cambridge, UK. What can they offer that more 
established sites, such as Facebook, don’t? 

A number of science networks are attempt- 
ing to answer that question by offering extra 
features. ResearchGATE, based in Cambridge, 
Massachusetts, has launched private fee-based 
accounts to give institutions or companies their 
own sub-communities. “Activity levels in the 
sub-communities are much higher” than in 
the network at large, says Ijad Madisch, chief 


executive and co-founder of the website. He 
says that companies want a communication 
solution — and ResearchGATE is happy to pro- 
vide it. According to Madisch, the website has 
created 30 sub-communities so far, and has a 
total of 700,000 members around the world, the 
most yet achieved by any science network, he 
says. At its launch in May 2008, ResearchGATE, 
like most such sites, focused on attracting single 
users — but in future, it will also cater to the 
needs of sub-communities, says Madisch. 
Some scientists may be wary of sharing too 
much, which probably hinders network adop- 
tion. “There is tension between collaboration 
and competition inherent to science,’ says 
James. “How willing will users be to share advan- 
tageous information, even with their closest 
collaborators, when they are still competing for 
grants?” Users may also be suspicious of a site's 
commercial intentions. “How these networks 
are monetized, and how their user communities 
will feel about the business models that emerge, 
remains to be seen,’ says James. The creators of 
Mendeley, a social network that also hopes to 
attract institutions and companies, have vowed 
not to sell identifiable personal information. 
Aspects of some social sites have spiked in 


popularity — notably, virtual events. Greg 
Cruikshank, chief executive of the networking 
site LabRoots, based in Yorba Linda, Califor- 
nia, says that his company started BioConfer- 
ence Live, a series of online-only events, last year 
with much success. The first show, in Novem- 
ber 2009, had more than 10,000 attendees, who 
were given access to 70 sessions at which they 
could question speakers (in future, Cruikshank 
expects closer to 40). They could also browse a 
virtual lobby, exhibit hall and exhibitor booths. 
“Although LabRoots does a bit of everything — 
from news feeds to events directory, jobs board 
and blogs — what distinguishes us is the virtual- 
event arm of the business, and we'll continue to 
push that,” says Cruikshank. 


TOP-DOWN OR BOTTOM-UP? 

The bells and whistles vary, but all the sites have 
the same aim: enhancing collaboration. To that 
end, an open-source networking movement is 
afoot. Rather than creating buzz among users 
and building a community from the ground up, 
the creators of freely available research-network 
software — such as VIVO, developed at Cornell 
University in Ithaca, New York, with a US$12.2- 
million grant from the National Institutes of > 
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LABROOTS/BIOCONFERENCE LIVE 


> Health, and Harvard Catalyst Profiles, 
run by the Harvard Clinical and Trans- 
lational Science Center in Boston — are 
among those working to forge a network 
of scientists by wooing whole institutions 
to adopt their platforms and upload pro- 
files of their faculty members. “Openness 
is a big deal as VIVO carves outits niche,” 
says Mike Conlon, the site's director. 

This approach is gaining traction. On 
9 October, the US Department of Agricul- 
ture (USDA) was the first federal agency 
to join the 14 academic institutions 
worldwide that contribute to VIVO’s 
21,000-member network. Harvard Cata- 
lyst Profiles has more than 30,000 users, 
from institutions including Harvard Uni- 
versity and the University of California, 
San Francisco. “We believe that data used 
in networks need provenance and struc- 
ture — some kind of hosting institution 
or scientific society verifying the credibil- 
ity of data,’ says Conlon. Sharon Drumm, 
a staff officer for the USDA Agricultural 
Research Service in Beltsville, Maryland, 
says that the USDA wants VIVO to con- 
nect its 3,500 researchers with the hope of 
fostering collaborations, providing a sin- 
gle information repository for the public 
and allowing prospective retirees to pass 
on institutional knowledge. 

Conlon cautions that some networking 
sites simply ingest publicly available data, 
creating profiles of people who aren't 
users. Sites with no checks on data run 
the risk, he says, of offering less valuable 
information. James suggests that a high 
enough proportion of fake profiles could 
undermine entire sites. In principle, the 
VIVO and Profiles models are more reli- 
able, because they use vetted data. 

VIVO hopes to link profiles with not 
only publications or grants, but also the 
most important thing that scientists share 
— data. “Right now data are the stepchil- 
dren of this thing,” says Conlon. Linking 
data sets to people would allow potential 
collaborators to see what others can offer, 
he says. 

But the existence of a network of pro- 
files, even a vetted one, doesn’t necessarily 
mean that researchers will use it as more 
than just another database. “I think there's 
a difference between scientists actively 
using a network and simply signing up all 
the scientists in an institution,” says Men- 
deley chief executive Victor Henning. 

It is clear that no single site will meet all 
scientists’ needs. What isn’t clear is which 
combination of sites will. “Scientists are 
not really interested in social networking 
as an end in itself? says Henning. “They 
network to boost productivity.” m 


Virginia Gewin is a freelance writer in 
Portland, Oregon. 
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TURNING POINT 


Francesca Malfatti 


Francesca Malfatti, a postdoctoral researcher 

at Scripps Institution of Oceanography in San 
Diego, California, received the International 
Recognition of Professional Excellence prize in 
August from the International Ecology Institute 
in Oldendorf, Germany. The award honours 
young ecologists who make breakthroughs. 
Malfatti explains how she and her mentor found 
an unexpected relationship between microbes. 


How did you get a post in the United States? 
Iwasborn in Trieste, Italy, and spent my summers 
collecting shells along the Adriatic Sea. I 
decided to study marine biology at the Uni- 
versity of Trieste. I got a six-month grant from 
the Italian embassy in the United States, and 
could take that money anywhere in the United 
States. My mentors at Trieste collaborate with 
Farooq Azam, a marine ecologist at Scripps 
who specializes in microbial food webs, and 
he agreed to host me. We established a rapport 
and he has kept me on through my PhD and 
postdoc studies. 


How did you make a breakthrough discovery? 
My research advanced significantly when 
Dr Azam received money from the Gordon and 
Betty Moore Foundation in Palo Alto, Califor- 
nia, to buy a US$250,000 atomic-force micro- 
scope. Using this tool, we were able to conduct 
our investigations at different resolutions, and 
discovered an unexpected relationship between 
bacteria that were thought to be complemen- 
tary, yet isolated from one another. 


Was it difficult to confirm your findings? 

It took about a year and a half to understand 
how best to use the atomic-force microscope. 
Once I mastered it, I found co-occurrence of 
heterotrophic bacteria and cyanobacteria. It 
took another three months of more-quantita- 
tive measurements to validate the discovery. 


Why is this an important discovery? 

It changes how we perceive bacterial life in the 
open ocean. Cyanobacteria are primary produc- 
ers, and get energy from the Sun; heterotrophic 
bacteria are the major consumers of cyanobacte- 
ria. Our hypothesis is that cyanobacteria photo- 
synthesize part of the organic matter taken up 
by heterotrophs in a more tightly coupled, possi- 
bly symbiotic, system than we realized. This has 
implications for carbon cycling in the ocean. 


Why did you apply for the award? 

Dr Azam nominated me. In his 30 years of stud- 
ying marine microbial systems, he had never 
thought that heterotrophs and cyanobacteria 
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would be associated for any part of their life- 
times. He thought finding such an unexpected 
relationship was worthy of the award. 


Why are you moving on from working with him? 
We will continue until I leave Scripps at the 
end of 2011. We havea solid working relation- 
ship, but we decided to diversify. He thinks it 
is best for my career and education that I see 
another lab or institution. So far, we have writ- 
ten down 21 wide-ranging ideas that we both 
think should be tackled. 


How will you decide how to divide the work? 
Luckily, there is so much to do that it will be 
easy to avoid overlap or competition. We are 
trying to be pragmatic about who can do what 
type of research. For example, if I end up doing 
another postdoc or become a starting profes- 
sor without much start-up money, I probably 
wont be able to do certain kinds of experi- 
ments, such as those that require atomic-force 
microscopy. 


What sort of impact do you hope your research 
will have? 

Dr Azam impressed upon me the need to create 
better models of dynamic ecological systems 
to serve policy-makers and humanity as a 
whole. Understanding the ocean helps humans 
because so many biogeochemical processes 
occur there. I hope my efforts help to build up 
an accurate ocean-based model of the planet 
that might help society both predict and safe- 
guard against any impacts of climate change. 


Do you feel pressure to follow up with another 
breakthrough? 

No. My brain doesn't work that way. We already 
live in a very competitive environment. I ama 
self-motivator, and doing science well is good 
enough. Not everybody can win a Nobel Prize, 
but we still need strong science. m 
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> Health, and Harvard Catalyst Profiles, 
run by the Harvard Clinical and Trans- 
lational Science Center in Boston — are 
among those working to forge a network 
of scientists by wooing whole institutions 
to adopt their platforms and upload pro- 
files of their faculty members. “Openness 
is a big deal as VIVO carves outits niche,” 
says Mike Conlon, the site's director. 

This approach is gaining traction. On 
9 October, the US Department of Agricul- 
ture (USDA) was the first federal agency 
to join the 14 academic institutions 
worldwide that contribute to VIVO’s 
21,000-member network. Harvard Cata- 
lyst Profiles has more than 30,000 users, 
from institutions including Harvard Uni- 
versity and the University of California, 
San Francisco. “We believe that data used 
in networks need provenance and struc- 
ture — some kind of hosting institution 
or scientific society verifying the credibil- 
ity of data,’ says Conlon. Sharon Drumm, 
a staff officer for the USDA Agricultural 
Research Service in Beltsville, Maryland, 
says that the USDA wants VIVO to con- 
nect its 3,500 researchers with the hope of 
fostering collaborations, providing a sin- 
gle information repository for the public 
and allowing prospective retirees to pass 
on institutional knowledge. 

Conlon cautions that some networking 
sites simply ingest publicly available data, 
creating profiles of people who aren't 
users. Sites with no checks on data run 
the risk, he says, of offering less valuable 
information. James suggests that a high 
enough proportion of fake profiles could 
undermine entire sites. In principle, the 
VIVO and Profiles models are more reli- 
able, because they use vetted data. 

VIVO hopes to link profiles with not 
only publications or grants, but also the 
most important thing that scientists share 
— data. “Right now data are the stepchil- 
dren of this thing,” says Conlon. Linking 
data sets to people would allow potential 
collaborators to see what others can offer, 
he says. 

But the existence of a network of pro- 
files, even a vetted one, doesn’t necessarily 
mean that researchers will use it as more 
than just another database. “I think there's 
a difference between scientists actively 
using a network and simply signing up all 
the scientists in an institution,” says Men- 
deley chief executive Victor Henning. 

It is clear that no single site will meet all 
scientists’ needs. What isn’t clear is which 
combination of sites will. “Scientists are 
not really interested in social networking 
as an end in itself? says Henning. “They 
network to boost productivity.” m 


Virginia Gewin is a freelance writer in 
Portland, Oregon. 
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TURNING POINT 


Francesca Malfatti 


Francesca Malfatti, a postdoctoral researcher 

at Scripps Institution of Oceanography in San 
Diego, California, received the International 
Recognition of Professional Excellence prize in 
August from the International Ecology Institute 
in Oldendorf, Germany. The award honours 
young ecologists who make breakthroughs. 
Malfatti explains how she and her mentor found 
an unexpected relationship between microbes. 


How did you get a post in the United States? 
Iwasborn in Trieste, Italy, and spent my summers 
collecting shells along the Adriatic Sea. I 
decided to study marine biology at the Uni- 
versity of Trieste. I got a six-month grant from 
the Italian embassy in the United States, and 
could take that money anywhere in the United 
States. My mentors at Trieste collaborate with 
Farooq Azam, a marine ecologist at Scripps 
who specializes in microbial food webs, and 
he agreed to host me. We established a rapport 
and he has kept me on through my PhD and 
postdoc studies. 


How did you make a breakthrough discovery? 
My research advanced significantly when 
Dr Azam received money from the Gordon and 
Betty Moore Foundation in Palo Alto, Califor- 
nia, to buy a US$250,000 atomic-force micro- 
scope. Using this tool, we were able to conduct 
our investigations at different resolutions, and 
discovered an unexpected relationship between 
bacteria that were thought to be complemen- 
tary, yet isolated from one another. 


Was it difficult to confirm your findings? 

It took about a year and a half to understand 
how best to use the atomic-force microscope. 
Once I mastered it, I found co-occurrence of 
heterotrophic bacteria and cyanobacteria. It 
took another three months of more-quantita- 
tive measurements to validate the discovery. 


Why is this an important discovery? 

It changes how we perceive bacterial life in the 
open ocean. Cyanobacteria are primary produc- 
ers, and get energy from the Sun; heterotrophic 
bacteria are the major consumers of cyanobacte- 
ria. Our hypothesis is that cyanobacteria photo- 
synthesize part of the organic matter taken up 
by heterotrophs in a more tightly coupled, possi- 
bly symbiotic, system than we realized. This has 
implications for carbon cycling in the ocean. 


Why did you apply for the award? 

Dr Azam nominated me. In his 30 years of stud- 
ying marine microbial systems, he had never 
thought that heterotrophs and cyanobacteria 
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would be associated for any part of their life- 
times. He thought finding such an unexpected 
relationship was worthy of the award. 


Why are you moving on from working with him? 
We will continue until I leave Scripps at the 
end of 2011. We havea solid working relation- 
ship, but we decided to diversify. He thinks it 
is best for my career and education that I see 
another lab or institution. So far, we have writ- 
ten down 21 wide-ranging ideas that we both 
think should be tackled. 


How will you decide how to divide the work? 
Luckily, there is so much to do that it will be 
easy to avoid overlap or competition. We are 
trying to be pragmatic about who can do what 
type of research. For example, if I end up doing 
another postdoc or become a starting profes- 
sor without much start-up money, I probably 
wont be able to do certain kinds of experi- 
ments, such as those that require atomic-force 
microscopy. 


What sort of impact do you hope your research 
will have? 

Dr Azam impressed upon me the need to create 
better models of dynamic ecological systems 
to serve policy-makers and humanity as a 
whole. Understanding the ocean helps humans 
because so many biogeochemical processes 
occur there. I hope my efforts help to build up 
an accurate ocean-based model of the planet 
that might help society both predict and safe- 
guard against any impacts of climate change. 


Do you feel pressure to follow up with another 
breakthrough? 

No. My brain doesn't work that way. We already 
live in a very competitive environment. I ama 
self-motivator, and doing science well is good 
enough. Not everybody can win a Nobel Prize, 
but we still need strong science. m 


INTERVIEW BY VIRGINIA GEWIN 


SCRIPPS INST. OCEANOGRAPHY 


BY JOHN FRIZELL 


C C ou have to do something about this 

Yo music,’ said Ellie. 

“T dont play it loud” 

“T didn’t mean you. It’s... everyone.” 

“I can't be responsible for the musical 
tastes of the entire world” 

A deliberate misunderstanding was typi- 
cal of her little brother. She glared at him. 

“OK. I'll get some stuff? 

They were meeting in the neutral ground 
of the living room. Jamie sloped off to the 
seething maelstrom of mechanical creatures 
and things he called a room and returned 
carrying a red plastic box. She looked into it 
as he put it down. Nothing was moving. 

Jamie put Out of Space on the music sys- 
tem and turned it up until she winced. 

“That's ghastly.” 

“How’s this?” 

He fished something that looked like a big 
set of orange earmuffs out of the box and she 
put them on. The dreadful beat softened, 
almost vanished. 

“That’s better, but...” 

Her voice sounded funny. It wasn’t just 
the music, she couldn't hear herself, couldn't 
hear anything. 

“What are these?” she said, handing them 
back. 

“Ear defenders. I wear them for grinding 
or cutting or when you and Mum argue.” 

“It’s just the music I dort want to hear. I 
have to be able to hear everything else” 

“That's going to cost you.” 

They settled on a month of gourmet des- 
serts, prepared by Ellie. 

“Starting tomorrow night.” 

“Starting when it works.” 

“Tt will be working tomorrow night.” 


Ellie put on the headphones Jamie handed 
her and looked dubiously at the pile of circuit 
boards into which they were plugged. She 
could see bare strips of copper on two of the 
boards. It didnt look safe. 

“Test version. Final result will be wireless 
and fit in your purse. Check it out.” 

He cranked up Out of Space. The beat 
pounded on her ears. 

“This doesn't work. I can still...” 

Abruptly the relentless beat died away. 

“What do you think?” 

She could hear him clearly, but not the 
music. She lifted one side of the headphones 
to check it was still on and quickly clamped 
them back. 


SILENCE 


Turnon, tune out. 


“Tt didn't work at first but now it’s perfect.” 

Jamie smiled. 

“Tt samples the track then it phones a serv- 
ice to get the name. That’s why you can hear 
the music for a few seconds. Once it knows 
the track it selects it from its database, syncs 
it up and then uses the waveform of the track 
to block out the incoming waveform.” 

She was not quite sure what he meant but 
could see the weaknesses. 

“So I’m always going to get a blast of 
sound, and if someone plays a song that isn’t 
on the database, I’m going to hear it” 

“Well, yes.” 

“It’s really good, she said, using her best 
positive, reinforcing voice, “but not quite 
what I need. I'll do you creme briilée tonight 
and it won't count as part of your month's 
total” 

Jamie could make anything work and, with 


her encouragement, he would. She went to 
the kitchen, got a vanilla pod from her spice 
drawer, split it lengthwise, scraped the seeds 
out while thinking of the social advantages 
of actually being able to follow conversations 
at parties, and then she cut the pod up into 
small pieces and cracked six eggs, separat- 
ing each into white and yolk. She would find 
something to do with the whites later. 


It took Jamie more than a week but the results 
were spectacular. Admittedly she had to train 
the system by pressing a button when she 
heard music she didnt like, but once she had 
done it the music never came back, not even 
if she pressed the training button halfway 
through and the music restarted from the 
beginning. Jamie went on about tonal analy- 
sis, pattern recognition, neural net process- 
ing, context algorithms and other mysterious 
stuff. It sounded like he would get another 
patent out of it and even more money in his 
trust fund. 


“Just one little thing. NATURE.COM 
These headphones are _ FollowFutures on 
niceand very comforta- _ Facebook at: 
blebutI can't goaround _ go.nature.com/mtoodm 
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wearing headphones all day.” 

“Why not?” 

“Because I'd look like an idiot? occurred 
to her but Jamie didn’t understand anything 
to do with look or style. 

“Some kids wear headphones all the time 
they arent in class. Girls even” 

“Losers,” thought Ellie. Aloud she said: “I 
wouldn't want the people whose music ’m 
tuning out to know.” 

He looked at her thoughtfully. 

“Tlike the way you wear your hair.’ 

Amazing! A compliment from Jamie. He 
was actually noticing her look. If he cared 
about how other people looked she could get 
him to care about how he looked. And once 
that was accomplished she could find him a 
girlfriend. Her mind raced, sorting though 
the younger sisters of her friends. 

“Tl get some Eartalkers or hearing aid 


inserts that are retained by your ears. The 
hair will cover them. Stealth” 
“Oh. Well. See you in the kitchen” 


Jamie loved rich desserts — nothing stuck to 
his skinny frame. He was shovelling down the 
syllabub. Ellie allowed herself only a single 
teaspoon, just to get the taste. She had zested 
a lime and a lemon for this and she was con- 
vinced that the combined citrus taste was 
much better than the standard recipe of lemon 
alone. She had added a bit of cardamom as 
well, giving the traditional English dessert a 
hint of something Middle Eastern and exotic. 
She decided on a second teaspoon. 

Jamie was talking happily between bites 
about how he had accomplished this latest 
feat of engineering. She smiled and nodded 
in the pauses. The thing could detect more 
patterns than just the ones in music — she 
could see his lips moving but she could not 
hear a word he said. m 


John Frizell was trained in biochemistry 
and works on ocean conservation for 
Greenpeace. In his spare time he walks, 
builds robots and writes short stories. 
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BRIEF COMMUNICATIONS ARISING 


Was the universal common ancestry proved? 


ARISING FROM D. L. Theobald Nature 465, 219-222 (2010) 


The question of whether or not all life on Earth shares a single common 
ancestor has been a central problem of evolutionary biology since 
Darwin’. Although the theory of universal common ancestry (UCA) 
has gathered a compelling list of circumstantial evidence, as given in 
ref. 2, there has been no attempt to test statistically the UCA hypothesis 
among the three domains of life (eubacteria, archaebacteria and eukar- 
yotes) by using molecular sequences. Theobald’ recently challenged this 
problem with a formal statistical test, and concluded that the UCA 
hypothesis holds. Although his attempt is the first step towards establish- 
ing the UCA theory with a solid statistical basis, we think that the test of 
Theobald? is not sufficient enough to reject the alternative hypothesis of 
the separate origins of life, despite the Akaike information criterion 
(AIC) of model selection’ giving a clear distinction between the com- 
peting hypotheses. 

Dawkins* argued that even though it may, at first, seem unlikely 
that such a complex structure as the eye evolved by selection, it could 
have been realized by a long sequence of small evolutionary steps 
driven by selection. Theobald’ mentions that statistically significant 
sequence similarity can arise from factors other than common ancestry, 
such as convergent evolution due to selection, but such factors were not 
taken into account in his ‘formal’ test to reject the independent origins 
hypothesis. 

Table 1 shows that the formal test provides support for a common 
origin of two putatively unrelated genes, mitochondrial cytb and nd2, 
with no homology. However, we believe that this result should not be 
regarded as evidence of the ultimate common ancestry of cytb and 
nd2. This raises a question mark as to the effectiveness of the formal 


Table 1| A formal test of the common ancestry between mitochondrial 
genes cytb and nd2 


Test statistic Score or value Number of parameters 
Common origin 

InL (cytb + nd2) =—5/090:20 18 

AIC 10,216.4 

Independent origin 

InL (cytb) —2,503.82 12 

InL (nd2) —2,608.17 12 

Total InL =—5,111.99 24 

AIC 10,271.97 


Nucleotide sequences of the mitochondrial genes cytb and nd2 from cow, deer and hippopotamus were 
analysed by PAML’° with the GTR + TF model assuming the relations of ((cow, deer), hippopotamus) for 
the common origin model. The 5’-terminal 1,038 bp (excluding the initiation codon) were used without 
making further alignment between the two different genes. The common origin model gave a lower AIC 
value than the independent origin model. InL, log-likelihood score. 


test applied by Theobald’. It should be noted that, because alignment 
gives a bias for common ancestry, we did not make an alignment 
between cytb and nd2. To reject the separate origins hypothesis of 
the domains of life, it would be indispensable to develop a more 
‘biological’ test to show that even by improving the model of the 
separate origins by taking into account biological factors such as the 
possibility of convergent evolution due to selection, the UCA hypo- 
thesis is still supported by the AIC. To do this, it is necessary to 
develop an entirely new methodological framework of molecular 
phylogenetics that is different from the conventional framework that 
neglects convergent and parallel evolution. Notably, there have been 
many reported cases of convergent and parallel evolution misleading 
molecular phylogenetic inference’, and such a method is needed for 
molecular phylogenetics in general. 
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BRIEF COMMUNICATIONS ARISING 


Theobald reply 


REPLYING TO T. Yonezawa & M. Hasegawa Nature 468, doi:10.1038/nautre09482 (2010) 


Yonezawa and Hasegawa’ provide an example from two apparently 
unrelated families of nucleic acid coding sequences for which an 
Akaike information criterion (AIC) model selection test, similar to 
mine’, chooses a common origin hypothesis. Although this may seem 
surprising, the coding sequences in this example were aligned in the 
same reading frame. The constraints of the genetic code are expected 
to induce correlations between these sequences (and among all coding 
sequences) that are not due to common ancestry. For instance, owing to 
codon bias and the structure of the genetic code, in these sequences the 
second codon position is biased towards T (about twofold over average), 
whereas the third position is usually an A (~50%) and rarely a G (~4%). 

One can account for these correlations explicitly by using codon 
models (as implemented in PAML’, codonFreq = 2 or 3) or standard 
amino acid models (as in PhyML*). With these more realistic models, 
independent ancestry is the strongly preferred hypothesis. 
Furthermore, the raw likelihoods and AIC scores increase signifi- 
cantly (by hundreds to thousands of logs), indicating that codon 
and amino acid models are greatly superior to the naive nucleotide 
models. 

Yonezawa and Hasegawa' point out that I’ did not explicitly test 
models in which selection or biophysical constraints generate 
sequence correlations among proteins with independent origins. 
Formal phylogenetic models accounting for such factors are currently 
unavailable; their development would be a welcome advance. 
Although these are important considerations for proteins with low 
sequence similarity, neither selection nor physical constraints alone 
can plausibly generate the high levels of sequence similarity (>55% 
average sequence identity) observed in the universal protein data set 
that I used*°. The amount of adaptive convergence necessary to pro- 
duce thousands of identical amino acids among 23 different proteins 
from completely independent beginnings is not comparable to the 
limited molecular convergence seen with, for example, homologous 
digestive lysozymes®, in which already highly similar proteins (in 
function, structure and sequence) later acquired a handful of identical 
substitutions in parallel. 

How could selection or biophysical constraints induce correlations 
among unrelated sequences? If certain similar amino acid sequences 
are necessary for performing specific functions (or for adopting a 
specific tertiary conformation that is necessary for function), then 
selection for function may ‘lead’ proteins with independent origins 
to neighbouring regions of sequence space. However, no particular 
protein sequence or fold is necessary for any given function. There are 
abundant examples of proteins with undetectable sequence similarity 
and different folds that perform the same biochemical and cellular 
functions’. For example, the proteases subtilisin, trypsin and carbox- 
ypeptidase have the same active site and mechanism, whereas papain, 
renin and thermolysin have different active sites and different 
mechanisms. All six proteases have radically different folds and 
sequences. Because different folds in general have different sequence 
requirements, proteins with the same function need not have similar 
sequences. 

Even assuming that a certain protein fold is necessary for a given 
function, current molecular evidence indicates that sequence require- 
ments for a fold are extremely low—nearly indistinguishable from 
random. This data comes from many independent sources from 
throughout biology. 

Many large classes of proteins with identical folds have no detectable 
sequence similarity (for example, families of TIM barrels, carbonic 
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anhydrases, OB-folds, SH3 domains, Rossmann folds and immuno- 
globulin domains). These proteins provide prima facie evidence that 
sequence requirements for any particular fold and function are nearly 
indistinguishable from random. Protein domains in the SCOP data- 
base* from different superfamilies yet with the same fold share ~9% 
sequence identity”. 

Identical folds with known independent origins have nearly ran- 
dom sequence similarity”"®. For example, unrelated proteins with the 
same fold from the MALISAM database share 8.5 + 0.4% sequence 
identity”"®. This data can be used to estimate the correlations among 
independently evolved and created proteins with the same fold, and 
the correlations are nearly random. In the universal protein data set 
that I used’, the average sequence correlation induced by common 
ancestry is roughly one log-likelihood per site for the most divergent 
proteins. In contrast, the correlations among independent proteins 
with the same fold are ~ 100 times weaker. From this we can estimate 
that model selection scores for common ancestry hypotheses will be 
many thousands of logs greater than competing selection hypotheses. 

Even the most conserved proteins have not yet reached the limits of 
sequence space, which has been estimated to be near the random 
expectation for any given fold and function". 

These arguments are largely circumstantial and informal. I have 
not tested all possible competing hypotheses, and my analysis will not 
be the “last word on common ancestry”"”. I emphasize that I have in 
no sense provided an absolute ‘proof of universal common ancestry. 
One of the great advantages of the model selection framework that I 
presented is that if a novel model is proposed with a well-defined 
likelihood function, then we can easily compare it to the common 
ancestry models and see how it fares. 
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BRIEF COMMUNICATIONS ARISING 


Global systematics of arc volcano position 


ARISING FROM Grove, T. et al. Nature 459, 694-697 (2009) 


Global systematics in the location of volcanic arcs above subduction 
zones'” are widely considered to be a clue to the melting processes that 
occur at depth, and the locations of the arcs have often been explained 
in terms of the release of hydrous fluids near the top of the subducting 
slab (see, for example, refs 3-6). Grove et al.’ conclude that arc volcano 
location is controlled by melting in the mantle at temperatures above 
the water-saturated upper-mantle solidus and below the upper limit of 
stability of the mineral chlorite and in particular, that the arc fronts lie 
directly above the shallowest point of such melt regions in the mantle. 
Here we show that this conclusion is incorrect because the calculated 
arc locations of Grove et al.’ are in error owing to the inadequate spatial 
resolution of their numerical models, and because the agreement that 
they find between predicted and observed systematics arises from a 
spurious correlation between calculated arc location and slab dip. A 
more informative conclusion to draw from their experiments is that 
the limits of chlorite stability (figure 1b of ref. 7) cannot explain the 
global systematics in the depth to the slab beneath the sharply localized 
arc fronts. 

Grove et al.’ hypothesize that arc volcano location is controlled by 
melting in the mantle at pressure and temperature conditions defined 
as ‘P, Tet in their figure 1b. Grove et al.’ then use numerical models of 
subduction zones to predict arc location and its global systematics. 
They conclude that the agreement between their calculated systematics 
of arc location and observations of real subduction zones** validates 
their hypothesis (figure 3 of ref. 7) but closer inspection of the shape of 
the P, Tne, region casts doubt upon this conclusion. A characteristic 
feature of subduction-zone models’ is the narrow thermal boundary 
layer, sub-parallel to and just above the slab surface, which contains the 
temperature range of P, Tne (~800-850 °C). For all but the slowest 
convergence rates, this boundary layer begins close to the depth at 
which the slab is viscously coupled to the wedge. Hence we should 
expect the region enclosing P, Tye to be a very thin, continuous layer 
above the slab, with its shallowest extent at an almost constant depth. 


The results of Grove et al.’ (green squares in their figure 2) are in- 
consistent with this expectation, and raise the suspicion of an error in 
their calculations. 

To locate their region of P, Tye; Grove et al.’ determined which 
nodes of their 2.3 X 2.3-km computational mesh lay within that P-T 
range. Because those conditions occur within a boundary layer only a 
few kilometres thick that is inclined at an angle to the mesh, this 
procedure did not resolve the full extent of the P, Ter region. To 
check their results, we calculated the temperature fields for subduc- 
tion zones on a 1 X 1-km grid, then resampled it to both 2.3-km 
resolution and to 0.25-km resolution. This was done for a range of 
subduction parameters and for each calculation we determined the P, 
Tet region and its shallowest point. We found that at 2.3-km reso- 
lution, the minimum depth of P, Tne ranged between about 57 and 
76 km, consistent with the range found by Grove et al.’. On the 
0.25 X 0.25-km grid, however, the minimum depth was confined 
between 57 and 61 km (Fig. la), consistent with the expectations we 
describe in the preceding paragraph. At either resolution, the minimum 
depth of P, Tet: is independent of the slab dip and of the convergence 
rate. 

Grove et al.’ compare their calculations with seismic studies, which 
show that the depth of the slab beneath arcs varies between ~80 and 
~150 km (refs 2, 8) and has a negative correlation with the descent 
speed of the slab (Fig. 1b). The depth to the top of the slab predicted by 
the hypothesis of Grove et al.’ applied under our recalculations is 
~60-75 km, independent of dip or convergence rate (Fig. 1b), and 
thus does not agree with the observations. 

The agreement between model and observations in Grove et al.’ is 
spurious, and is the result of their choice of variables. Figure 1c recre- 
ates their figure 3, which shows the apparent consistency between 
model and observations, using our recalculated location of arcs. The 
sine of slab dip is plotted on the x axis, and on the y axis is the arc— 
trench distance, which for all points (calculated and observed; see 


a 80 L | b 150 © 200 r 
A A 
134 > 
A A A n 
A oN A 125 r 8¢ 15044 o° = 
_ 0- A K 7 a= Poss ‘ 
& A A A £ 2 5 ; 8 
= 65- i x ke ae OR = 100- L 22 100- L 
7 s Oo @ oe 
ie 4 o o A Q T 5% A € & 
i i j i i 75- i Kor ff L oo 59 Ao? L 
“A phtad * &° 
55 
cy Binds Ay! NG 
50 50 0 
0.4 0.5 0.6 0.7 0.8 0.9 1.0 0 10 20 30 40 50 60 70 80 0.4 0.5 0.6 0.7 0.8 0.9 1.0 


sin(Dip) 


Figure 1 | Arc position versus subduction parameters for data and models. 
a, Calculated depth Dimer of the shallowest portion of the P, Tmer-based melting 
field (compare figures 1 and 2 of ref. 7). Calculations were carried out ona 1-km 
finite-volume mesh’, for dip of 30° to 70° in steps of 10°, and for convergence 
rate V from 30 to 100 mmyr~ ', in steps of 10 mm yr _'; these ranges include the 
parameters of the calculations of ref. 7. The points correspond to the minimum 
depths of melting calculated according to the hypothesis and methods of Grove 
et al.’ for a 2.3 X 2.3-km resampled grid (open triangles) and for a 0.25 X 0.25- 
km resampled grid (filled triangles). b, Diamonds show the depth of the slab 
Dgap» determined seismologically* (error bars as described by ref. 2); filled 
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Vsin(Dip) (mm yr’) 


sin(Dip) 


triangles show the calculated D,,4 below the locus of shallowest melting, for the 
0.25 X 0.25-km resampled grid from panel a. The red triangles correspond to 
the corrected values of D,j» for the combinations of dip and convergence rate 
used by ref. 7 (T. Grove et al., personal communication). The grey line 
corresponds to a constant Dgap = 62 km.c, This panel corresponds to the lower 
200 km of figure 3 in ref. 7. Points as for panel b, plotted for the horizontal 
distance between the trench and the arc, which is equal to Dya,/tan(Dip), the 
quantity on the y axis of figure 3 of ref. 7. The grey line corresponds to 

Data» = 62 km and demonstrates the spurious correlation referred to in the 
main text. 
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table 1 in ref. 7) is taken as the depth of the slab divided by the tangent 
of the dip. The presence of the sine of the dip on each axis ensures a 
spurious correlation; this is illustrated clearly in Fig. 1c by the grey line 
that corresponds to a constant value of the depth of the slab, 
Dygab = 62 km. 

Therefore there is no significance in the match between models and 
observations reported by Grove et al.’, and their conclusion that “the 
kinematic control on the location of mantle melting is primarily slab 
dip” (page 696 of ref. 7) is mistaken. Instead, we conclude from their 
experiments that the limits of chlorite stability (figure 1b of ref. 7) 
cannot explain the global systematics in the depth of the slab beneath 
sharply localized arc fronts, which is true for any strongly temperature- 
dependent process that takes place near the top of the slab, as we have 
discussed. In ref. 10 we suggest a process that can account for the global 
systematics in location of the arcs. 


Philip C. England? & Richard F. Katz? 
1Department of Earth Sciences, Parks Road, Oxford OX1 3PR, UK. 
e-mail: philip.england@earth.ox.ac.uk 


Grove et al. reply 


Received 7 September 2009; accepted 12 April 2010. 


1. Tovish, A. & Schubert, G. Island arc curvature, velocity of convergence and angle of 
subduction. Geophys. Res. Lett. 5, 329-332 (1978). 

2. England, P., Engdahl, R. & Thatcher, W. Systematic variation in the depths of slabs 
beneath arc volcanoes. Geophys. J. Int. 156, 377-408 (2004). 

3. Gill, J. Orogenic Andesites and Plate Tectonics (Springer, 1981). 

4. Tatsumi, Y. & Eggins, S. Subduction Zone Magmatism (Blackwell Science, 1995). 

5. lwamori, H. Transportation of H20 and melting in subduction zones. Earth Planet. 
Sci. Lett. 160, 65-80 (1998). 

6. Tatsumi, Y. The subduction factory: how it operates in the evolving Earth. GSA 
Today 15, 4-10 (2005). 

7. Grove, T., Till, C., Lev, E., Chatterjee, N. & Medard, E. Kinematic variables and water 
transport control the formation and location of arc volcanoes. Nature 459, 
694-697 (2009); erratum 460, 1044 (2009). 

8. Syracuse, E. & Abers, G. Global compilation of variations in slab depth beneath arc 
volcanoes and implications. Geochem. Geophys. Geosyst. 7, Q05017, doi:10.1029/ 
2005GC001045 (2006). 

9. van Keken, P. et al. A community benchmark for subduction zone modeling. Phys. 
Earth Planet. Inter. 171, 187-197 (2008). 

10. England, P. C. & Katz, R. F. Melting above the anhydrous solidus controls the 
location of volcanic arcs. Nature 467, 700-703 (2010). 


Competing financial interests: declared none. 


doi:10.1038/nature09154 


REPLYING TO England, P. C. & Katz, R. F. Nature 468, doi:10.1038/nature09154 (2010) 


In their Comment England and Katz' suggest that our model’ con- 
tains two flaws and that there are additional problems in our thermal 
models. This Reply points out an important part of our model that 
England and Katz’ appear to have missed, addresses their suggestion 
that there are flaws and discusses whether our thermal models are in 
error. 

The Comment' states that we “conclude that the arc fronts lie 
directly above the shallowest point [that satisfies the P, T,,. criterion] 
in the mantle”. This corresponds to our path A in figure 1 of ref. 2. The 
P, Tmeit Criterion described in ref. 1 refers to the melting that initiates 
just above the slab over a range of depths, illustrated between paths A 
and C in figure 1 of ref. 2. As we discussed’, when these initial melts 
ascend into the overlying mantle wedge, not all of them will experi- 
ence a pressure-temperature path that allows them to erupt from 
an arc volcano on the Earth’s surface. The melts formed at the shal- 
lowest depths (path A) will encounter cooler mantle as they ascend 
into the overlying mantle wedge and these melts will freeze. Only 
melts that ascend into the hottest interior portion of the mantle wedge 
(such as path B in figure 1 of ref. 2) will undergo sufficient melting to 
produce arc front volcanoes. To summarize our findings’, there are 
two important factors that control the location of arc volcanoes: (1) 
chlorite dehydration releases HO near the slab—wedge interface, and 
the H,O ascends into overlying mantle that is above the H,O- 
saturated mantle solidus (P, Timer in figure 1b of ref. 2) and (2) the 
temperature of the overlying mantle wedge increases with decreasing 
pressure to allow flux melting to continue to high extents and allow 
these high-extent melts to erupt at arc volcanoes (path B in figure 1 of 
ref. 2). 

England and Katz also state that the agreement that we” “find 
between predicted and observed systematics arises from a spurious 
correlation between calculated arc location and slab dip” (ref. 1). They 
attribute this purported spurious correlation in our figure 3 (ref. 2) to 
the presence of the tangent function on the vertical axis and a sine 
function on the horizontal axis. Although there is trigonometry 
involved in the correlation shown on this figure’, the relations are 


not spurious and are meaningful. The salient point in our figure 3 is 
that the beginning of H.O-saturated melting in our modelling (path A 
in figure 1a of ref. 2) consistently occurs at a depth of 60-70 km near 
the slab-wedge interface and is independent of the convergence rate 
and dip. We point out that these shallowest melts do not reach the 
surface (figure 1 of ref. 2), nor do they influence the location of 
volcanoes. Instead, the maximum amount of melting and hence the 
location of arc volcanoes are controlled by the position of the hottest 
part of the wedge above a slab. This is the region between paths B and 
C (figure 1 of ref. 2), the region of maximum melting from our models. 
The arc-trench distance for paths B to C, and thus the location of arc 
volcanoes, is close to the values reported by England et al.’ and parallel 
to the trend of Syracuse and Abers*. The distance a given isotherm is 
from the trench decreases with increasing convergence rate and spans 
a range of values that are represented in the data of ref. 3. An inter- 
esting outcome of our thermal modelling (figure 2 of ref. 2) is that at 
steep dip angles, paths A and B occur at very similar distances from 
the trench. 

England and Katz say that our thermal modelling results (in figure 
2 of ref. 2) “raise the suspicion of an error in [our] calculations” (ref. 
1). England and Katz continue with a discussion of grid size in the 
numerical calculations that they performed, but it is impossible for 
us or for any reader of the Comment’ to assess the veracity of their 
claim that we’ “did not resolve the full extent of the P, Tmett region” 
(ref. 1). We have verified our modelling methods using the com- 
munity benchmarks developed for subduction zone modelling® and 
we also find that our model results for the temperature structure 
near the slab—wedge interface are comparable to those of others who 
have benchmarked their models, such as Wada and Wang®, who 
explicitly considered the issues associated with slab-mantle viscous 
coupling. 

Thus, we disagree with the conclusion reached by England and 
Katz’ that “the limits of chlorite stability cannot explain the global 
systematics in the depth of the slab beneath sharply localized arc 
fronts”. The conclusions we reached in ref. 2 rely on the interplay 
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of two important controls on hydrous melting in the mantle wedge 
above subducted slabs: the dehydration of chlorite near the base of the 
wedge and the temperature structure of the overlying mantle wedge. 
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Was the universal common ancestry proved? 


ARISING FROM D. L. Theobald Nature 465, 219-222 (2010) 


The question of whether or not all life on Earth shares a single common 
ancestor has been a central problem of evolutionary biology since 
Darwin’. Although the theory of universal common ancestry (UCA) 
has gathered a compelling list of circumstantial evidence, as given in 
ref. 2, there has been no attempt to test statistically the UCA hypothesis 
among the three domains of life (eubacteria, archaebacteria and eukar- 
yotes) by using molecular sequences. Theobald’ recently challenged this 
problem with a formal statistical test, and concluded that the UCA 
hypothesis holds. Although his attempt is the first step towards establish- 
ing the UCA theory with a solid statistical basis, we think that the test of 
Theobald? is not sufficient enough to reject the alternative hypothesis of 
the separate origins of life, despite the Akaike information criterion 
(AIC) of model selection’ giving a clear distinction between the com- 
peting hypotheses. 

Dawkins* argued that even though it may, at first, seem unlikely 
that such a complex structure as the eye evolved by selection, it could 
have been realized by a long sequence of small evolutionary steps 
driven by selection. Theobald’ mentions that statistically significant 
sequence similarity can arise from factors other than common ancestry, 
such as convergent evolution due to selection, but such factors were not 
taken into account in his ‘formal’ test to reject the independent origins 
hypothesis. 

Table 1 shows that the formal test provides support for a common 
origin of two putatively unrelated genes, mitochondrial cytb and nd2, 
with no homology. However, we believe that this result should not be 
regarded as evidence of the ultimate common ancestry of cytb and 
nd2. This raises a question mark as to the effectiveness of the formal 


Table 1| A formal test of the common ancestry between mitochondrial 
genes cytb and nd2 


Test statistic Score or value Number of parameters 
Common origin 

InL (cytb + nd2) =—5/090:20 18 

AIC 10,216.4 

Independent origin 

InL (cytb) —2,503.82 12 

InL (nd2) —2,608.17 12 

Total InL =—5,111.99 24 

AIC 10,271.97 


Nucleotide sequences of the mitochondrial genes cytb and nd2 from cow, deer and hippopotamus were 
analysed by PAML’° with the GTR + TF model assuming the relations of ((cow, deer), hippopotamus) for 
the common origin model. The 5’-terminal 1,038 bp (excluding the initiation codon) were used without 
making further alignment between the two different genes. The common origin model gave a lower AIC 
value than the independent origin model. InL, log-likelihood score. 


test applied by Theobald’. It should be noted that, because alignment 
gives a bias for common ancestry, we did not make an alignment 
between cytb and nd2. To reject the separate origins hypothesis of 
the domains of life, it would be indispensable to develop a more 
‘biological’ test to show that even by improving the model of the 
separate origins by taking into account biological factors such as the 
possibility of convergent evolution due to selection, the UCA hypo- 
thesis is still supported by the AIC. To do this, it is necessary to 
develop an entirely new methodological framework of molecular 
phylogenetics that is different from the conventional framework that 
neglects convergent and parallel evolution. Notably, there have been 
many reported cases of convergent and parallel evolution misleading 
molecular phylogenetic inference’, and such a method is needed for 
molecular phylogenetics in general. 
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Theobald reply 


REPLYING TO T. Yonezawa & M. Hasegawa Nature 468, doi:10.1038/nautre09482 (2010) 


Yonezawa and Hasegawa’ provide an example from two apparently 
unrelated families of nucleic acid coding sequences for which an 
Akaike information criterion (AIC) model selection test, similar to 
mine’, chooses a common origin hypothesis. Although this may seem 
surprising, the coding sequences in this example were aligned in the 
same reading frame. The constraints of the genetic code are expected 
to induce correlations between these sequences (and among all coding 
sequences) that are not due to common ancestry. For instance, owing to 
codon bias and the structure of the genetic code, in these sequences the 
second codon position is biased towards T (about twofold over average), 
whereas the third position is usually an A (~50%) and rarely a G (~4%). 

One can account for these correlations explicitly by using codon 
models (as implemented in PAML’, codonFreq = 2 or 3) or standard 
amino acid models (as in PhyML*). With these more realistic models, 
independent ancestry is the strongly preferred hypothesis. 
Furthermore, the raw likelihoods and AIC scores increase signifi- 
cantly (by hundreds to thousands of logs), indicating that codon 
and amino acid models are greatly superior to the naive nucleotide 
models. 

Yonezawa and Hasegawa' point out that I’ did not explicitly test 
models in which selection or biophysical constraints generate 
sequence correlations among proteins with independent origins. 
Formal phylogenetic models accounting for such factors are currently 
unavailable; their development would be a welcome advance. 
Although these are important considerations for proteins with low 
sequence similarity, neither selection nor physical constraints alone 
can plausibly generate the high levels of sequence similarity (>55% 
average sequence identity) observed in the universal protein data set 
that I used*°. The amount of adaptive convergence necessary to pro- 
duce thousands of identical amino acids among 23 different proteins 
from completely independent beginnings is not comparable to the 
limited molecular convergence seen with, for example, homologous 
digestive lysozymes®, in which already highly similar proteins (in 
function, structure and sequence) later acquired a handful of identical 
substitutions in parallel. 

How could selection or biophysical constraints induce correlations 
among unrelated sequences? If certain similar amino acid sequences 
are necessary for performing specific functions (or for adopting a 
specific tertiary conformation that is necessary for function), then 
selection for function may ‘lead’ proteins with independent origins 
to neighbouring regions of sequence space. However, no particular 
protein sequence or fold is necessary for any given function. There are 
abundant examples of proteins with undetectable sequence similarity 
and different folds that perform the same biochemical and cellular 
functions’. For example, the proteases subtilisin, trypsin and carbox- 
ypeptidase have the same active site and mechanism, whereas papain, 
renin and thermolysin have different active sites and different 
mechanisms. All six proteases have radically different folds and 
sequences. Because different folds in general have different sequence 
requirements, proteins with the same function need not have similar 
sequences. 

Even assuming that a certain protein fold is necessary for a given 
function, current molecular evidence indicates that sequence require- 
ments for a fold are extremely low—nearly indistinguishable from 
random. This data comes from many independent sources from 
throughout biology. 

Many large classes of proteins with identical folds have no detectable 
sequence similarity (for example, families of TIM barrels, carbonic 
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anhydrases, OB-folds, SH3 domains, Rossmann folds and immuno- 
globulin domains). These proteins provide prima facie evidence that 
sequence requirements for any particular fold and function are nearly 
indistinguishable from random. Protein domains in the SCOP data- 
base* from different superfamilies yet with the same fold share ~9% 
sequence identity”. 

Identical folds with known independent origins have nearly ran- 
dom sequence similarity”"®. For example, unrelated proteins with the 
same fold from the MALISAM database share 8.5 + 0.4% sequence 
identity”"®. This data can be used to estimate the correlations among 
independently evolved and created proteins with the same fold, and 
the correlations are nearly random. In the universal protein data set 
that I used’, the average sequence correlation induced by common 
ancestry is roughly one log-likelihood per site for the most divergent 
proteins. In contrast, the correlations among independent proteins 
with the same fold are ~ 100 times weaker. From this we can estimate 
that model selection scores for common ancestry hypotheses will be 
many thousands of logs greater than competing selection hypotheses. 

Even the most conserved proteins have not yet reached the limits of 
sequence space, which has been estimated to be near the random 
expectation for any given fold and function". 

These arguments are largely circumstantial and informal. I have 
not tested all possible competing hypotheses, and my analysis will not 
be the “last word on common ancestry”"”. I emphasize that I have in 
no sense provided an absolute ‘proof of universal common ancestry. 
One of the great advantages of the model selection framework that I 
presented is that if a novel model is proposed with a well-defined 
likelihood function, then we can easily compare it to the common 
ancestry models and see how it fares. 
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H2AX prevents CtIP-mediated DNA end resection 
and aberrant repair in Gl-phase lymphocytes 
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DNA double-strand breaks (DSBs) are generated by the recombina- 
tion activating gene (RAG) endonuclease in all developing lym- 
phocytes as they assemble antigen receptor genes’. DNA cleavage 
by RAG occurs only at the G1 phase of the cell cycle and generates 
two hairpin-sealed DNA (coding) ends that require nucleolytic 
opening before their repair by classical non-homologous end- 
joining (NHEJ)*®. Although there are several cellular nucleases that 
could perform this function, only the Artemis nuclease is able to do 
so efficiently**. Here, in vivo, we show that in murine cells the 
histone protein H2AX prevents nucleases other than Artemis from 
processing hairpin-sealed coding ends; in the absence of H2AX, 
CtIP can efficiently promote the hairpin opening and resection of 
DNA ends generated by RAG cleavage. This CtIP-mediated resec- 
tion is inhibited by y-H2AX and by MDC-1 (mediator of DNA 
damage checkpoint 1), which binds to y-H2AX in chromatin flank- 
ing DNA DSBs. Moreover, the ataxia telangiectasia mutated (ATM) 
kinase activates antagonistic pathways that modulate this resection. 
CtIP DNA end resection activity is normally limited to cells at 
post-replicative stages of the cell cycle, in which it is essential for 
homology-mediated repair**. In G1-phase lymphocytes, DNA ends 
that are processed by CtIP are not efficiently joined by classical 
NHBEjJ and the joints that do form frequently use micro-homologies 
and show significant chromosomal deletions. Thus, H2AX pre- 
serves the structural integrity of broken DNA ends in G1-phase 
lymphocytes, thereby preventing these DNA ends from accessing 
repair pathways that promote genomic instability. 

V(D)J recombination, the reaction that assembles the second exon 
of all antigen receptor genes, requires the generation and repair of 
DNA DSBs and occurs exclusively during the G1 phase of the cell 
cycle’. The Rag-1 and Rag-2 proteins, which together form an endo- 
nuclease referred to as RAG, initiate V(D)J recombination by intro- 
ducing DNA DSBs at the border of two gene segments and their 
associated recombination signal (RS) RAG recognition sequences’. 
Cleavage by RAG generates a pair of hairpin-sealed coding ends and 
a pair of blunt signal ends’. These distinct pairs of DNA ends are 
processed and joined by NHEJ to form a coding joint and a signal 
joint, respectively~”. Signal ends undergo minimal nucleolytic proces- 
sing before joining. In contrast, hairpin-sealed coding ends must first 
be opened by an endonuclease and are also frequently processed by 
exonucleases before joining”’. This nucleolytic processing results in 
the antigen receptor gene sequence diversification that is essential for 
adaptive immunity. 

Efficient opening of the hairpin-sealed coding ends generated by 
RAG cleavage is dependent on the Artemis nuclease**. Although other 
cellular nucleases have enzymatic activity that could open and resect 
hairpin-sealed coding ends, this functional redundancy is not evident 
in Artemis-deficient cells; as a result, Artemis-deficient mice and 
humans are severely lymphopenic’. Thus, the nucleolytic processing 


of broken DNA ends in Gl-phase lymphocytes must be tightly regu- 
lated; however, the cellular components that mediate this regulation 
are not known. Unrepaired coding ends are resolved as chromosomal 
translocations at a higher frequency in lymphocytes deficient in both 
Artemis and H2AX than in lymphocytes with an isolated deficiency of 
Artemis®. Because the formation of these translocations requires the 
Artemis-independent opening of hairpin-sealed DNA ends, we con- 
sidered that H2AX might function to restrict the ability of nucleases to 
act on broken DNA ends in G1-phase lymphocytes. 

To test this notion, we generated pre-B-cell lines transformed with 
the v-abl kinase (hereafter referred to as abl pre-B cells), each of which 
contained a single integrant of the pMX-DEL” retroviral recombina- 
tion substrate and was wild type (WT:DEL”), deficient in Artemis 
(Artemis ‘:DEL®) or deficient in both Artemis and H2AX 
(Artemis ‘~:H2AX ‘~:DEL“)**. pMX-DEL” had a single pair of 
recombination signals and rearranged by deletion, resulting in a cod- 
ing joint that remained within the chromosomal context (Fig. 1a)’. 
Treatment of abl pre-B cells with the v-abl kinase inhibitor STI571 
leads to cell cycle arrest in G1 and the induction of RAG, which in 
NHE)-deficient cells results in the accumulation of unrepaired coding 
ends®®. In Gl-arrested Artemis ‘~:DELY abl pre-B cells, these unre- 
paired coding ends were homogeneous in size, as expected given their 
hairpin-sealed structure (Fig. 1b and Supplementary Figs 1 and 2). In 
contrast, coding ends in Gl-arrested Artemis ‘~:H2AX ‘~:DEL® abl 
pre-B cells were heterogeneous in size and significantly smaller (up to 
1 kilobase), suggesting that these hairpin-sealed coding ends had been 
opened and resected in an Artemis-independent fashion (Fig. 1b and 
Supplementary Figs 1 and 2). H2AX also regulates coding-end resection 
in DNA ligase IV-deficient (LigIV ‘~ ) abl pre-B cells, in which hairpin- 
sealed coding ends can be efficiently opened by Artemis but cannot be 
ligated (Supplementary Fig. 3). Moreover, blunt chromosomal signal 
ends generated by RAG cleavage at the pMX-DELY retroviral substrate 
also show significant resection in LigIV ’:H2AX /~:DEL* abl pre-B 
cells (Supplementary Fig. 4)’. 

To show that hairpin-sealed coding ends have been opened, DNA 
end structure was assayed by TdT-assisted PCR, which can detect 
hairpin-opened but not hairpin-sealed coding ends (Supplementary 
Fig. 5a)!°. In this regard, analysis of Artemis ‘~:H2AX ’~ :DEL® abl 
pre-B cells revealed robust TdT-assisted PCR products, indicating that 
many of the hairpin-sealed coding ends had been opened (Supplemen- 
tary Fig. 5c). To quantify the fraction of open coding ends in 
Artemis ‘~:H2AX ‘~:DEL® abl pre-B cells, we performed Southern 
blot analyses of native and denatured genomic DNA. Denaturing 
hairpin-opened coding ends dissociates the complementary DNA 
strands into single-stranded fragments that migrate differently from 
hairpin-sealed coding ends, whose complementary strands are cova- 
lently linked and therefore do not dissociate (Supplementary Fig. 6). 
Analyses of denatured coding ends from Artemis ‘~:H2AX ’:DEL7 
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Figure 1 | H2AX inhibits DNA end resection in G1-phase lymphocytes. 

a, pMX-DEL? retroviral recombination substrate unrearranged (UR), with 
coding ends that are hairpin-sealed (hCE) or open (oCE), and coding joint (CJ). 
Recombination signals (filled triangles), EcoRV sites (RV), the C4b probe (black 
bar) and fragment sizes are shown. b, Southern blot analysis of EcoRV-digested 
genomic DNA from wild-type (WT:DEL9-119.6), Artemis ‘~:DELT 
(Artemis ‘~:DEL7-7) and Artemis ‘~:H2AX /~:DEL7 

(Artemis ’~:H2AX ‘~ :DEL@-95) abl pre-B cell lines treated with STI571 


abl pre-B cells revealed that up to 70% of the hairpin-sealed coding 
ends had been opened (Fig. 1c and Supplementary Fig. 6). 

The requirement for H2AX in regulating coding-end processing 
was also observed at the immunoglobulin light chain (Igl)k locus in 
primary Gl-phase bone-marrow-derived pre-B cell cultures from 
Artemis ‘~ and Artemis ‘~:H2AX ‘~ mice expressing an immuno- 
globulin heavy chain (Igh) transgene (Ightg) (Supplementary Fig. 7)°*. 
Southern blotting and TdT-assisted PCR revealed that the Jk coding 
ends in Artemis ’~:H2AX ‘~ :Ightg primary pre-B cells had open hair- 
pins and were resected, whereas those from Artemis ‘~:Ightg pre-B 
cells remained hairpin-sealed (Supplementary Fig. 8). Taken together, 
these data show that H2AX restricts the activity of nucleolytic path- 
ways that would otherwise aberrantly resect unrepaired hairpin-sealed 
or open coding ends in lymphocytes at the Gl-phase of the cell cycle. 

H2AX-dependent DNA damage responses generally depend on 
the phosphorylation of serine 139 by ATM or DNAPKcs to form 
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Figure 2 | ATM and y-H2AX regulate DNA end resection. a, pMX-DELY 
Southern blot analysis (as described in Fig. 1a, b) of STI571-treated 

LigIV ’ :H2AX ‘~:DEL” abl pre-B cells reconstituted with an empty 
retroviral vector (empty) or vectors encoding wild-type H2AX (H2AX) or 
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(STI). C4b hybridizing UR, CJ and hCEs are indicated, as are oCEs that have 
been resected (bracket). Tcrb constant region (Cb) probe hybridization was 
used as a loading control. c, Quantification of denaturing Southern blot 
analyses for open coding ends in LigIV ’:DEL® (LigIV ‘:DEL®-10), 
Artemis ‘~:DEL® (Artemis ‘~:DEL@-81) and Artemis ‘~:H2AX /~ :DEL7 
(Artemis ’~:H2AX ~~ :DEL9-95 and Artemis /~:H2AX~/~:DEL-124) abl 
pre-B cells (see Supplementary Figs 2 and 3 for primary data). 


y-H2AX in chromatin flanking DNA DSBs, including those generated 
by RAG!"?, Reconstitution of LigIV’~:H2AX ’~:DEL? and 
Artemis /~:H2AX~/~:DEL® abl pre-B cells with wild-type H2AX, 
but not with a serine 139-to-alanine mutant (H2AX°"**“), inhibited 
coding-end resection in these cells (Fig. 2a and Supplementary Fig. 9). 
These findings implicate y-H2AX formation at broken DNA ends in 
maintaining the structure of these ends in Gl-phase lymphocytes but 
do not exclude the possibility that H2AX also inhibits resection 
through additional pathways that are independent of y-H2AX forma- 
tion. Although ATM is required for the optimal formation of y-H2AX 
in chromatin flanking RAG DSBs, treatment of Artemis ’~:DEL® abl 
pre-B cells with the ATM kinase inhibitor KU-55933 did not lead to 
the opening and resection of hairpin-sealed coding ends (Fig. 2b)’. 
Rather, treatment of Artemis ’~:H2AX ’:DEL® abl pre-B cells with 
KU-55933 resulted in a significant block in end resection, with a large 
fraction of DNA ends in these cells being hairpin-sealed (Fig. 2b and 
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H2Ax°"4. b, pMX-DELD Southern blot analysis of Artemis /~:DEL? and 
Artemis ‘~:H2AX /~:DEL® abl pre-B-cell lines treated with either the ATM 
inhibitor (ATM) KU55933 (+) or a dimethylsulphoxide vehicle control (—). 
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data not shown). We conclude that ATM functions to inhibit coding- 
end resection through the formation of y-H2AX; however, ATM activity 
is also required to promote this nucleolytic resection. Thus, ATM 
modulates the function of antagonistic pathways that both positively 
and negatively regulate DNA end resection in G1-phase lymphocytes. 

We show that, in the absence of H2AX, nucleases other than Artemis 
can efficiently open and resect hairpin-sealed coding ends in a manner 
that is dependent on ATM. In this regard, ATM positively regulates 
CtIP activity in promoting DNA end resection’. CtIP binds directly to 
Nbs1, a component of the Mrel1-Rad50-Nbs1 (MRN) complex that 
associates with RAG DSBs and functions in their repair?’?’*>. 
Furthermore, Sae2, the S. cerevisiae orthologue of CtIP, functions 
with Mrell to promote the opening and resection of hairpin-sealed 
DNA ends in vitro'’. Taken together, these data suggest that H2AX 
could regulate the ability of CtIP to mediate hairpin opening and DNA 
end resection in G1-phase lymphocytes. Indeed, knockdown of CtIP in 
both Artemis ‘~:H2AX /~:DEL® and LigIV ’~:H2AX ‘~:DEL” abl 
pre-B cells largely blocked the aberrant coding-end resection observed 
in these cells (Fig. 3a, b and Supplementary Figs 10 and 11). Moreover, 
in CtIP-deficient Artemis ‘":H2AX ’:DEL” abl pre-B cells most 
unrepaired coding ends were hairpin-sealed (Fig. 3c). Although our 
data show that the hairpin coding ends in Artemis ‘~:H2AX ‘~:DELT 
abl pre-B cells have been opened, we cannot determine the position of 
opening or the extent of resection after opening. However, it is notable 
that Sae2 can mediate hairpin opening at significant distances from the 
hairpin tip in vitro’’. 

The DNA-damage-response protein MDC-1 is recruited by 
y-H2AX to chromatin flanking DNA DSBs’”'’. We found that, like 
y-H2AX, MDC-1 was also required for the inhibition of the ATM- 
dependent resection of coding ends in Gl-phase lymphocytes 
(Supplementary Fig. 12). Because CtIP binds to the FHA domain of 
Nbs!1 that also binds MDC-1, this raises the possibility that, on recruit- 
ment to DSBs by y-H2AX, MDC-1 may disrupt CtIP-Nbs1 inter- 
actions’. The DNA-damage-response protein 53BP1, which is 
also retained at DSB sites in a y-H2AX-dependent manner, regulates 
DNA end resection during V(D)J recombination in thymocytes and 
during immunoglobulin class switch recombination”. In addition, 
53BP1 inhibits CtIP-dependent DNA end resection in Brca-1-deficient 
cells at post-replicative stages of the cell cycle; thus, y-H2AX may 
inhibit DNA end resection in these cells by recruiting 53BP1 (ref. 
25). However, at post-replicative stages of the cell cycle, H2AX also 
functions to promote DNA DSB repair by homologous recombination, 
which requires CtIP-mediated DNA end resection**®. Thus, H2AX 


a Lig IV/-:H2AX~-: b 
DEL©-128 


LETTER 


may function in a cell-cycle-specific manner to modulate the activity 
of several pathways that regulate DNA end resection. 

What is the fate of broken DNA ends processed by CtIP in Gl-phase 
lymphocytes? During V(D)J recombination, Artemis functions primarily 
to open hairpin-sealed coding ends, after which core NHEJ factors join 
these DNA ends’. However, hairpin-sealed coding ends opened in a 
CtIP-dependent manner persisted unrepaired at high levels in 
Artemis ‘~:H2AX ‘~ abl pre-B cells (Fig. 1b and Supplementary Fig. 
2). In this regard, the single-strand overhangs generated by CtIP- 
mediated resection during homologous recombination would probably 
be poor substrates for NHEJ in Gl1-phase cells*. However, this resection 
could expose regions of homology flanking the DSB, which, if used to 
mediate DSB repair by homology-driven repair pathways, would form 
joints with chromosomal deletions*. Indeed, PCR and sequence analyses 
of coding joints formed in Artemis ‘~:H2AX ‘~:DEL® abl pre-B cells 
revealed that they were heterogeneous in size, in contrast with those 
formed in WT:DEL® or Artemis ’~:DELY abl pre-B cells (Fig. 4a, b 
and Supplementary Figs 13 and 14). These joints had significant deletions 
extending up to 700 base pairs, which is the maximum size deletion that 
would be amplified by the PCR approach used (Fig. 4a, b and 
Supplementary Fig. 14). Moreover, the coding joints formed in 
Artemis ‘~:H2AX ‘~:DEL® abl pre-B cells used microhomologies at 
a higher frequency than those formed in WT:DEL® abl pre-B cells 
(50% versus 5%; Fig. 4c and Supplementary Fig. 14). Analysis of T-cell 
receptor B (Tcrb) Db1 to Jb1.1/Jb1.2 joints in Artemis ‘~:H2AX /~ 
thymocytes revealed that these joints similarly showed significant dele- 
tions compared with those formed in either Artemis ‘~ or wild-type 
thymocytes (Supplementary Fig. 15). We conclude that RAG-mediated 
DNA breaks generated in H2AX-deficient lymphocytes that are pro- 
cessed in a CtIP-dependent manner are resistant to repair by classical 
NHEjJ. However, these DNA ends can be channelled into homology- 
driven repair pathways, resulting in joints that form significant chromo- 
somal deletions. 

The requirement for H2AX in the prevention of CtIP-dependent 
resection and the resolution of RAG DSBs as chromosomal deletions is 
congruent with the phenotype of H2AX-deficient mice, which are 
predisposed to lymphoid tumours that can harbour chromosomal 
lesions indicative of aberrantly resolved RAG DSBs’’”*. However, 
chromosomal V(D)J recombination proceeds efficiently in H2AX- 
deficient abl pre-B cells*’*. Moreover, DbJb joints formed in H2AX /~ 
thymocytes did not exhibit significant deletions or an increase in 
micro-homology usage in comparison with wild-type thymocytes 
(Supplementary Figs 16 and 17). Thus, H2AX may be required for 
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Figure 3 | H2AX prevents CtIP-mediated DNA end resection. a, b, pMX- 
DEL Southern blot analysis (as described in Fig. 1a, b) of STI571-treated 
LigIV_/:H2AX /~:DEL® (a) and Artemis ’~:H2AX /~:DEL (b) abl pre-B 
cells expressing either non-targeting (NT) or CtIP-specific (CtIP) shRNAs. 
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(c) Quantification of denaturing Southern blot analysis for open CEs from 
STI571-treated Artemis ’~:H2AX ‘~:DEL® abl pre-B cells expressing either 
NT or CtIP-specific shRNAs. 
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Figure 4 | Aberrant joining in H2AX-deficient cells. a, PCR products 
representing normal and deleted (bracketed) pMX-DEL® CJs from WT:DEL”, 
Artemis ‘~:DELY and Artemis ’~:H2AX ’ :DEL® abl pre-B cells treated 
with STI571. Serial fivefold dilutions of genomic DNA were amplified. IL-2 
gene PCR was used as a loading control. b, c, Base-pairs deleted (b) and 
microhomology utilization (c) in pMX-DEL CJs sequenced from WT:DEL? 
and Artemis ’ :H2AX ’:DEL® abl pre-B cells (Supplementary Fig. 14). The 
total number of sequenced coding joints (7) is indicated in c. The pie chart 
shows the fraction of joints with 1, 2 or at least 3 microhomologies and the total 
number (centre) of joints with microhomologies. (d) Model for H2AX function 
in DNA end processing in G1-phase lymphocytes. Red blocks represent 
homologous sequences. Results are shown as means and s.e.m.; P values were 
determined by using Student’s t-test with Welch’s correction for unequal 
variances. 


the repair of a limited subset of RAG DSBs. Alternatively, other 
proteins may compensate for a more general requirement for H2AX 
in DNA DSB repair during V(D)J recombination. In agreement with 
this notion, whereas isolated deficiencies in H2AX or XRCC4-like 
factor (XLF) have a minimal effect on chromosomal V(D)J recombina- 
tion, the combined deficiency of H2AX and XLF in abl pre-B cells 
results in a significant accumulation of unrepaired coding ends”. 
Moreover, these coding ends are more extensively resected than those 
in either Artemis ’~:H2AX ‘~ or LigIV ~ :H2AX “~ abl pre-B cells, 
raising the possibility that H2AX and XLF have overlapping activities 
in modulating DNA end resection”. 

We have shown that H2AX maintains the structural integrity of 
broken DNA ends generated during V(D)J recombination in Gl- 
phase lymphocytes. This protective function of H2AX would ensure 
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that unrepaired DNA ends are either joined through classical NHEJ or 
signal for elimination of the cell by apoptosis (Fig. 4d). In the absence 
of H2AX, ATM/CtIP-dependent resection creates genomic instability 
by allowing DNA ends to be shuttled into homology-driven repair 
pathways that can form potentially dangerous chromosomal deletions 
(Fig. 4d). Formation of chromosomal translocations in H2AX- 
deficient mice may rely on these defects in DNA end processing 
coupled with the diminished retention of coding ends in post-cleavage 
complexes in H2AX-deficient cells®. Significant parallels can be drawn 
between mechanisms that protect, process and repair RAG DSBs and 
those incurred by environmental genotoxins. In this regard, a subset of 
genotoxic DSBs require Artemis for their repair by NHE]J, suggesting 
that these broken DNA ends must undergo nucleolytic processing”. 
Furthermore, the repair of these genotoxic DSBs may also depend on 
H2AX*°. Thus, our finding that H2AX regulates the processing of 
unrepaired DNA ends during V(D)J recombination may reflect a 
broader function of H2AX in regulating the nucleolytic processing 
of DNA DSBs generated during other physiological processes and by 
genotoxic agents. 


METHODS SUMMARY 


AbI pre-B cell lines containing the pMX-DEL” retroviral recombination substrate 
were generated and maintained, and interleukin (IL)-7-dependent pre-B cells were 
cultured as described previously’*. Standard protocols for Southern blotting 
(native and denaturing), western blotting, flow cytometry, retroviral-mediated 
protein expression and short hairpin RNA (shRNA)-mediated knockdown were 
followed. TdT-assisted PCR for coding ends generated during rearrangement of 
pMX-DEL” and the Iglk locus was performed as described in Methods"”. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Mice. All animals were housed in a specific pathogen-free facility at Washington 
University School of Medicine, and all animal protocols were approved by the 
Washington University Institutional Animal Care and Use Committee. 

Cell line generation and cell culture. Rag‘, Artemis ’~ and WT abl pre-B cells 
were described previously”*. Artemis ‘~:H2AX ‘~ v-abl-transformed pre-B cells 
were generated as described previously’. LigIV ’~ and LigIV ’:H2AX ‘~ abl 
pre-B cells were generated by treating Lig IV"? and Lig IV'°?"°?:H2AX /— 
abl pre-B cells with a Tat-Cre fusion protein®*’. Cells were incubated for 1h in 
medium containing 50 jig ml_' Tat-Cre and subcloned 48 h later. Cre-mediated 
deletion of the DNA Ligase IV gene was confirmed by PCR and Southern blotting. 
All lines were transduced with pMX-DEL or pMX-DEL® by co-centrifugation, 
and clonal populations each containing a single integrant of the retroviral sub- 
strate were isolated by limiting dilution as described previously’. Cells were treated 
with 3 uM STI571 (Novartis) for the indicated durations at a concentration of 
10° cells ml” '. KU-55933 (Tocris) was used at a concentration of 15 1M. Primary 
bone-marrow pre-B-cell cultures were generated by harvesting bone marrow from 
Rag ’ :Ightg, Artemis ‘~:Ightg or Artemis ‘:H2AX ‘~ :Ightg mice followed by 
culture for 6-10 days in medium containing 5ng ml! IL-7 maintained at a con- 
centration of about 2 X 10° cells ml! before withdrawal of IL-7 (ref. 8). 
Retroviral reconstitution and lentiviral knockdown. Reconstitution of 
Artemis ‘:H2AX /~ and LigIV ‘:H2AX ‘~ abl pre-B cells was performed by 
retroviral transduction with either empty retrovirus or retrovirus containing cDNAs 
encoding H2A X or H2AX°'**“. A cDNA encoding H2AX°!*** (serine 139 changed 
to alanine) was generated by PCR-based site-directed mutagenesis of WT H2AX 
cDNA. cDNAs encoding WT H2AX and H2AX°!°?* were cloned into the pMX- 
PIE retroviral vector and cells were transduced by co-centrifugation as described 
previously*. Cells expressing the retroviral construct were obtained by flow cyto- 
metric cell sorting of cells expressing GFP using a FACSVantage (BD Biosciences). 
Generation of lentiviral shRNAs vectors was performed with the previously 
described pFLRU:YFP lentiviral vector’***. CtIP-specific and non-targeting (NT) 
shRNAs were cloned into the pFLRU:YFP lentiviral vector. Sequences targeted 
by the shRNA were CtIP (5'-GAGCAGACCTTTCTCAGTA-3’) and NT 
(5'-GGTTCGATGTCCCAATTCTG-3’). These pFLRU:shRNA:YFP vectors 
(4g) were individually co-transfected with 4 tig of pHR’A8.2R packaging vector 
and 1 pg of pCMV-VSV¢g envelope plasmid into HEK293T cells plated at 70% 
confluence in 6-cm’ plates, using Lipofectamine 2000 (Invitrogen). Medium was 
replaced 12h after transfection. Supernatants were harvested 24h later. 
Transduction of abl pre-B cells was performed by co-centrifugation with viral 
supernatant at 1,800 r.p.m. (650g) for 90 min, with Polybrene added to 5 ugml*. 
Cells expessing the pFLRU-shRNA vectors were obtained by flow cytometric cell 
sorting of cells expressing YFP, with a FACSVantage (BD Biosciences). 

Southern blotting analyses. Standard Southern blot analysis of pMX-DEL” and 
pMX-DEL* was performed with EcoRV-digested genomic DNA and the C4b 
probe as described previously*. Southern blot analyses of coding ends generated 
during rearrangement at the Ig/k locus were performed as described previously on 
genomic DNA digested with SacI and EcoRI, with the J«III probe’. Denaturing 
agarose-gel electrophoresis was performed as described previously, with modifi- 
cations”. In brief, 40 ig of genomic DNA was digested overnight with EcoRV ina 
400-1 volume and concentrated to 30 ul. The DNA was then resuspended with 
the addition of 5 volumes of a solution containing 8M urea, 1% Nonidet P40, 
1 mM Tris-HCl pH 8.0 and 0.5 mg ml bromophenol blue. This DNA solution 
was divided in two; one half was heated at 90 °C for 8 min to denature the genomic 
DNA, and the other half was incubated on ice. After 8 min, the heated DNA 
samples were placed on ice before electrophoresis at 50 V and 4°C on a 1.2% 
agarose Tris-acetate-EDTA gel with 1 M urea for about 24h in TAE buffer also 


containing 1M urea. Quantification was performed with ImageJ software. To 
calculate the percentage of hairpin-sealed coding ends, we measured the integrated 
density (ID) of the band representing open coding ends (oCEs) and that representing 
hairpin-sealed coding ends (hCEs) and subtracted background ID levels to obtain 
the corrected ID levels for each (corr. ID°“ and corr. ID", respectively). The ID for 
the closed ends was divided by the sum of the IDs of closed and open ends and 
converted to a percentage: hCE (%) = 100(corr. 1b'“)/(corr. ID? + corr. ID), 
TdT-assisted PCR. TdT-assisted PCR analysis of pMX-DEL” coding ends was 
performed as described previously, with the IRES REV5 oligonucleotide 
(5'-CTCGACTAAACACATGTAAAGC-3’) for the primary PCR reaction, the 
IRES REV4 oligonucleotide (5'-CCCTTGTTGAATACGCTTG-3’) forthe secondary 
PCR reaction and the I4 oligonucleotide (5'-TAAGATACACCTGCAAAGGCG-3’) 
as a probe"’. Similar conditions were used for PCR analyses of coding ends generated 
during rearrangement of the endogenous Ig/k locus, using the Jk2 ds oligonucleotide 
(5'-CCACAAGAGGTTGGAATGATTTTC-3’) for amplification and the Jk oligo- 
nucleotide (5'-GTAGTCTTCTCAACTCTTGTTCACT-3’) as a probe. IL-2 gene 
PCR, which was provided as a DNA loading control, was performed as described 
previously”®. 

Analysis of pMX-DEL® coding joints. pMX-DEL7 coding joints were amplified 
by using oligonucleotides pC (5'-GCACGAAGTCTTGAGACCT-3’) and IRES 
REV5 (5'-CTCGACTAAACACATGTAAAGC-3’). Genomic DNA (300 ng) was 
used in the original amplification with serial fivefold dilutions. Oligonucleotide 
IR4 (5'-CCCTTGTTGAATACGCTTG-3’) was used as a probe. Cloning and 
sequencing of pMX-DELY coding joints was performed as described previously”. 
Pvalues for both Fig. 4b and Fig. 4c were calculated by Student’s t-test with Welsh’s 
correction for unequal variances. 

Analysis of Tcrb coding joints. Db1-Jb1.1 and Db1-Jb1.2 CJs were amplified by 
using oligonucleotides Db1 us (5’-CCTTCCTTATCTTCAACTC-3’) and Jb1.2 ds 
(5'-CCTGACTTCCACCCGAGGTT-3’) with the following PCR parameters: 
92°C for 1min30s, 55°C for 2min30s, 72°C for 1min. Thymic DNA 
(300 ng) was used in the original amplification (13 cycles). PCR products were 
digested with BglII for 3h before a second round of amplification (30 cycles) to 
permit the isolation of DJb CJs without the competing amplification from the 
germline Tcrb locus. Southern blot analyses of the PCR products were performed 
with oligonucleotide Jb (5'-GTAATCAGAGGAAGGATG-3’) as a probe. Thymic 
DNA isolated from three individual mice for each genotype (WT, Artemis ’~ and 
Artemis /~:H2AX~/~) was analysed. Db1-Jb1.1 CJs from WT and H2AX/~ 
thymic DNA were cloned and sequenced. 

Flow cytometric analyses. Flow cytometric analyses were performed on a 
FACSCaliber (BD Biosciences) with fluorescein isothocyanate-conjugated anti- 
CD45R/B220 and allophycocyanin-conjugated IgM and the appropriate isotype 
control. All antibodies were from BD Biosciences. Cell cycle analyses were per- 
formed by assessing DNA content by incubation with Hoechst 33342 dye 
(Invitrogen) for 1h at 37°C before flow cytometric analysis. The percentage of 
cells in G1 and S + G2 + M were approximated with the Dean-Jett-Fox method 
in FLOW-JO 8.8.6 software. 
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ATM damage response and XLF repair factor are 
functionally redundant in joining DNA breaks 
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Classical non-homologous DNA end-joining (NHEJ) is a major 
mammalian DNA double-strand-break (DSB) repair pathway. 
Deficiencies for classical NHEJ factors, such as XRCC4, abrogate 
lymphocyte development, owing to a strict requirement for 
classical NHEJ to join V(D)J recombination DSB intermediates’. 
The XRCC4-like factor (XLF; also called NHEJ1) is mutated in 
certain immunodeficient human patients and has been implicated 
in classical NHEJ**; however, XLF-deficient mice have relatively 
normal lymphocyte development and their lymphocytes support 
normal V(D)J recombination’. The ataxia telangiectasia-mutated 
protein (ATM) detects DSBs and activates DSB responses by phos- 
phorylating substrates including histone H2AX’. However, ATM 
deficiency causes only modest V(D)J recombination and lympho- 
cyte developmental defects, and H2AX deficiency does not have a 
measurable impact on these processes’”’. Here we show that XLF, 
ATM and H2AX all have fundamental roles in processing and 
joining DNA ends during V(D)J recombination, but that these 
roles have been masked by unanticipated functional redundancies. 
Thus, combined deficiency of ATM and XLF nearly blocks mouse 
lymphocyte development due to an inability to process and join 
chromosomal V(D)J recombination DSB intermediates. Combined 
XLF and ATM deficiency also severely impairs classical NHEJ, but 
not alternative end-joining, during IgH class switch recombination. 
Redundant ATM and XLF functions in classical NHEJ are mediated 
by ATM kinase activity and are not required for extra-chromosomal 
V(D)J recombination, indicating a role for chromatin-associated 
ATM substrates. Correspondingly, conditional H2AX inactivation 
in XLF-deficient pro-B lines leads to V(D)J recombination defects 
associated with marked degradation of unjoined V(D)J ends, reveal- 
ing that H2AX has a role in this process. 

Assembly of immunoglobulin and T-cell-receptor variable region 
exons is initiated by the RAG] and RAG2 endonuclease (hereafter 
referred to as RAG), which generates DNA DSBs between a pair of 
participating V, D, or J coding segments and flanking recombination 
signal sequences’®. V(D)J recombination is completed via joining, 
respectively, of the two coding segments and two recombination signal 
sequences by classical NHEJ*. Whereas XLF-deficient (XLF™4) embry- 
onic stem cells and mouse embryonic fibroblasts are impaired for 
V(D)J recombination of extra-chromosomal substrates’, XLF~““ mice 
are only modestly impaired for lymphocyte development, and XLF““ 
pro-B lines, although having increased sensitivity to ionizing radiation 
(IR), perform nearly normal V(D)J recombination*. Thus, unknown 
factors may compensate for XLF V(D)J recombination functions in 
developing lymphocytes’. Among the candidates, we considered 
ATM, which is activated by RAG-generated DSBs”*"'. To elucidate 
whether ATM has an overlapping V(D)J joining function with XLF, 
we bred XLF™“ mice’ with ATM-deficient (Atm /~)'? mice to generate 
XLF““Atm ‘~ mice. XLF““Atm™'~ mice were live born but 


were significantly smaller than control littermates (Supplementary 
Fig. 1). 

XLF“4 and Atm /~ mice had only a modest (2-3-fold) reduction in 
thymocyte numbers and no gross alterations in thymocyte develop- 
ment, as revealed by staining for the CD4 and CD8 differentiation 
markers (Fig. la, b and Supplementary Table 1). In contrast, 
XLF““Atm '~ mice had a greater than 20-fold decrease in thymocyte 
numbers, to levels nearly as low as those of Raga ' ~ mice, with an 
overall developmental pattern reminiscent of that of certain classical 
NHE)J deficient mice with a ‘leaky’ V(D)J recombination block’. B-cell 
development was also relatively unimpaired in XLF- and ATM- 
deficient mice, with both having only modestly reduced (2-3-fold) 
B220*IgM™ splenic B-cell numbers (Fig. 1a, c and Supplementary 
Table 1)*'*. In contrast, XLFY“Atm ‘~ mice had extremely low splenic 
B-cell numbers (Fig. la, c and Supplementary Table 1). Analyses of 
bone marrow B-cell development in XLF““Atm~'~ mice suggested 
an impairment at the CD43*B220~ progenitor (pro-) B-cell stage in 
which V(D)J recombination is initiated, as shown by the near absence of 
B220*CD43— precursor B cells (Fig. la). To test further whether 
impaired B-cell development in XLF‘“Atm™'~ mice involved a 
V(D)J recombination defect, we bred IgH and IgL loci that contained 
knock-in mutations of pre-assembled IgH and [gL variable region exons 
(referred to as HL)'? into the XLF““Atm /~ background and found a 
significant rescue of B-cell, but not T-cell, development (Fig. la—c and 
Supplementary Table 1). Together, these findings indicate that XLF/ 
ATM double-deficiency severely impairs T- and B-cell development by 
impairing V(D)J recombination. 

To test unequivocally for V(D)J recombination end-joining defects, 
we generated v-abl transformed pro-B-cell lines from wild-type, 
XL as Atm~’~ and XLF““Atm~'~ mice that carried Bcl-2 trans- 
genes®. Treatment of v-abl transformed pro-B lines with STI571, a 
v-abl kinase inhibitor, arrests cells in G1 and induces RAG, leading 
to efficient V(D)J recombination of integrated substrates in wild-type 
cells*. The Bcl-2 transgene obviates apoptotic effects of STI571 (ref. 8). 
We generated multiple pro-B lines from each genotype that har- 
boured, respectively, either a V(D)J recombination substrate designed 
to assay coding joins (CJs) and unjoined coding ends (CEs) (Fig. 2a) or 
a substrate designed to assay recombination signal sequence joins (SJs) 
and unjoined recombination signal sequence ends (SEs) (Fig. 2b). For 
these experiments, DNA from individual lines was prepared at day 0 
(before treatment), day 2 and day 4 of STI571 treatment, digested with 
restriction endonucleases and assayed for hybridization to the indi- 
cated probes (Fig. 2c, d). Wild-type and XLF™ lines generated sub- 
stantial CJ and SJ levels at day 2 and 4 with little or no obvious free CEs, 
indicative of a classical NHEJ defect (Fig. 2c, d). Atm ‘~ lines also 
generated substantial levels of CJs and SJs, but, consistent with pre- 
vious studies®, also generated a modest level of unjoined CEs at day 2 
that appeared partially resolved by day 4 (Fig. 2c). However, there was 
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Figure 1 | ATM and XLF have redundant functions in lymphocyte 
development. a, Representative flow cytometric analyses of bone marrow, 
spleen and thymus from wild-type, XLF™4, Atm”, Rago! 7 XLFS“ Atm 
and XLF‘“Atm'~ HL mice (see Methods for further description of mouse 
lines). Numbers on the plot are percentages of total cells represented by the 
indicated population. b, c, Total thymocyte and CD4* CD8* double positive 
(DP) thymocyte numbers (b) and IgM" splenic B-cell numbers (c). Each value 
listed represents the average + standard deviation from at least three mice 
between 4-12 weeks of age. See Supplementary Table 1 for details. 


no obvious recombination signal sequence joining defect in the 
Atm ‘~ lines (Fig. 2d). In contrast, XLF““Atm~'~ lines had little 
accumulation of CJs or SJs at either time point and, instead, accumu- 
lated unjoined CEs and SEs, respectively (Fig. 2c, d). 

Wealso tested for V(D)J recombination defects with a substrate that 
activates a GFP gene upon inversional V(D)J recombination and 
which, via Southern blotting, reveals CJs, hybrid joins (aberrant joins 
in which a recombination signal sequence is fused to a coding end) and 
free CEs (Fig. 2e, f). We clonally integrated a single-copy inversional 
V(D)J substrate into XLF*“ pro-B lines that were also homozygous for 
a conditional knockout ATM allele (Atm@©)” and then deleted floxed 
Atm“© sequences via Cre recombinase to generate XLFY“Atm /~ 
lines with the same substrate integration. Thus, these matched sets of 
lines allow assay of a given integrated substrate in XLF~“ pro-B lines 
before and after elimination of ATM. We treated inversional substrate 
containing wild-type, Atm", XLF* 4 Atmo'C, XLF“ “Atm '~ and 
Xrcc4_/~ lines with STI571 and assayed for V(D)J recombination both 
by GFP expression (Supplementary Fig. 2) and Southern blotting 
(Fig. 2f). Both assays confirmed the severe V(D)J recombination defect 
in XLF’“Atm '~ pro-B lines and Southern blotting confirmed 
severely defective end-joining, as revealed by a marked decrease in 
CJs and a marked increase in unjoined CEs (Fig. 2f). The severity of 
the inversional V(D)J joining defect in XLFS4 Atm '~ pro-B lines was 
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similar to that of XRCC4-deficient pro-B lines (Fig. 2f and Sup- 
plementary Fig. 2). XLF“/ pro-B lines treated with an ATM kinase 
inhibitor also showed a severe end-joining defect during V(D)J recom- 
bination, indicating that the ATM-mediated V(D)J joining activity 
revealed in XLF-deficient lines is mediated by ATM kinase activity 
(Fig. 2f). Finally, STI571-treated XLF““Atm ‘~ pro-B lines accumu- 
lated unrepaired V(D)J recombination-associated breaks within their 
endogenous Igk locus, similar to those observed in Artemis ‘~ (also 
called Dclrelc) pro-B lines, confirming that the V(D)J recombination 
defect associated with combined XLF and ATM deficiency extends to 
this endogenous immunoglobulin locus (Supplementary Fig. 3). 

To characterize further the V(D)J recombination defect in 
XLF““Atm~~ versus wild-type, XLFS ES Atm~'~ and Xrcec47/— pro- 
B lines, we assayed for V(D)J recombination on transiently introduced 
extra-chromosomal substrates'*. As this assay is semi-quantitative, 
within perhaps a fivefold range, and is best for revealing profound 
defects, we performed at least four independent assays for each geno- 
type (Supplementary Table 1). As expected”, transient coding and 
recombination signal sequence joining activity for XRCC4-deficient 
cells was more than 50-fold less than that of wild-type cells, whereas 
coding and recombination signal sequence joining activity for XLF“’“ 
and Atm ‘~ cells approached the wild-type range (Supplementary 
Table 2). Surprisingly, the range of coding and recombination signal 
sequence joining activity of XLF’“Atm ‘~ pro-B lines, although 
potentially modestly decreased, overlapped that of wild-type and single 
mutant cells (Supplementary Table 2). Thus, in contrast to severe 
defects in chromosomal V(D)J recombination, XLF/ATM double 
mutant pro-B lines lack severe defects in extra-chromosomal V(D)J 
recombination. 

The chromosomal V(D)J joining defect in XLFS4 Atm '~ pro-B 
lines can be attributed to impaired classical NHEJ, as V(D)J recom- 
bination exclusively uses this pathway’. However, the question 
remains as to whether XLF““Atm ‘~ cells have more general classical 
NHEJ defects and whether they are impaired for other forms of end- 
joining. IgH class switch recombination (CSR) involves the introduc- 
tion of DSBs into switch (S) region upstream of the Cu constant region 
exons and their joining to DSBs within a downstream S region, result- 
ing in IgH CSR’*. Although classical NHEJ is a major CSR joining 
pathway, CSR occurs at reduced levels in classical NHEJ-deficient cells 
via alternative end-joining (A-EJ)'*. To assay CSR, we activated wild- 
type, XLFM“, Atm '~ and XLF““Atm~/~ HL B cells for 4 days with 
anti-CD40 and interleukin (IL)-4 to stimulate CSR to IgGl. As 
expected!””°, XLF™4 and Atm ‘~ B cells switched to IgG1 at about 
40% of wild-type levels*’” °°. Moreover, XLF\4Atm ‘~ HLB cells also 
showed substantial residual IgG1 CSR that was on average about 25% 
of wild-type levels (Fig. 3a, b and Supplementary Fig. 4). To gain 
further insight into involved joining pathways, we sequenced the Su 
to Syl junctions. Classical NHEJ generates CSR junctions with no 
microhomology (for example, direct joins) and junctions with short 
(1-2 bp) microhomologies’*, whereas A-EJ generates CSR junctions 
that predominantly contain microhomologies'*. As expected, about 
40% of wild-type junctions were direct’’, whereas only about 22% 
and 13%, respectively, of Atm '~ and XLF““ junctions were direct*”®, 
indicating some classical NHEJ impairment in these mutant B cells 
(Fig. 3c and Supplementary Fig. 4). However, only about 5% of 
XLF““Atm~'~ CSR joins were direct (Fig. 3c and Supplementary 
Fig. 4), consistent with most of their residual CSR being carried out 
by A-EJ. These results indicate that combined XLF and ATM defi- 
ciency impairs general classical NHEJ during CSR, but does not sub- 
stantially impair A-EJ. 

Our finding that the overlapping function of ATM with XLF 
involves ATM kinase activity and is required for chromosomal versus 
extra-chromosomal V(D)J joining indicates that this function involves 
ATM substrates. In response to DSBs, ATM phosphorylates H2AX’. 
However, H2AX deficiency is not known to have a detectable impact 
on V(D)J recombination’. To test for overlapping H2AX and XLF 
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Figure 2 | ATM and XLF have redundant functions in chromosomal V(D)J 
recombination. a, b, e, Schematic of pMX-DEL-C] (a), pMX-DEL-SJ (b) and 
pMX-INV (e) retroviral recombination substrates designed to assay CJ, SJ and 
inversional V(D)J recombination, respectively*. Diagrams indicate un- 
rearranged substrate (UR), coding/signal end (CE/SE) intermediates and 
coding/signal joints (CJ/SJ). The 12-recombination signal sequence (12-RS; 
open triangle), GFP coding sequence, 23-recombination signal sequence (23- 
RS; filled triangle), IRES-truncated hCD4 cDNA (I-hCD4) and LTRs are 
indicated. Positions of EcoRV (EV) sites, Ncol (N) sites and C4 probe (black 
bar) are shown. ¢c, d, Southern blotting with C4 probe of EcoRV-digested DNA 


functions, we inter-crossed XLF~“ mice that were heterozygous for an 
inactivating mutation of H2AX” (H2AX*'~; H2AX also called 
H2afx). Notably, these crosses yielded no XLF““H2AX ‘~ pups, with 
embryonic death of double homozygous mutants occurring before 
embryonic day 13.5 (Table 1). The finding that combined XLF and 
H2AX deficiency, but not combined XLF and ATM deficiency, is 
embryonically lethal might have several explanations. One is that the 
lethality results from ATM-independent S-phase functions of H2AX. 
Another is that impaired checkpoint functions associated with ATM 
deficiency rescue downstream effects of classical NHEJ deficiency that, 
otherwise, could result in embryonic death”’. 

To determine whether XLF and H2AX have overlapping V(D)J 
recombination functions, we generated XLF™4 mice that were homo- 
zygous for a loxP-flanked H2AX allele (H2AX*’")?!??, From these 
mice, we generated v-abl transformed XLFY4H2Ax" pro-B lines 
containing an integrated single-copy inversional V(D)J recombination 
substrate (Fig. 4a, b and Supplementary Fig. 5). We then used Cre 
recombinase to generate matched sets of XLF“H2AX"" and 
XLFY4H2Ax lines, which were treated with STI571 and assayed 
for V(D)J recombination via GFP expression (Supplementary Fig. 5). 
In six matched sets, each with a different substrate integration, H2AX 
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from STI571-treated (2 or 4 days) v-abl pro-B lines containing pMX-DEL-CJ 
(c) or pMX-DEL-SJ (d) substrates. Results were obtained from cell pools with 
diverse substrate integrations. Similar results were obtained with single 
integration clones (not shown). Bands reflecting pMX-DEL-C] UR, CE, CJ 
(panel c) and pMX-DEL-SJ UR, SE, SJ (panel d) are indicated. WT, wild type. 
f, Southern blot with C4 probe of EcoRV-Ncol-digested (top panel) or EcoRV- 
digested (bottom panel) DNA from indicated lines containing a single pMX- 
INV substrate. The XLFY“Atm“ and the XLF™“Atm/~ lines have identical 
integrations. See legend to Supplementary Fig. 5 for detailed methods. 


deletion reduced, but did not eliminate, V(D)J recombination (Fig. 4a, 
b and Supplementary Fig. 5), indicating that XLF and H2AX also have 
overlapping V(D)J recombination activities, but not to the same extent 
as XLF and ATM. We also assayed for CJs, hybrid joins and unjoined 
CEs by Southern blotting in three matched sets of XLF““H2AX""" and 
XLF““H2AX ‘~ pro-B lines (Fig. 4a, b and Supplementary Fig. 4b). 
XLF““H2AX"" lines behaved like wild-type or XLF™“ lines, as they 
robustly generated CJs but not hybrid joins or unjoined CEs (Fig. 4a, b 
and Supplementary Fig. 5). However, in accord with GFP assays, 
XLF““H2AX ‘~ pro-B lines had substantially reduced CJs compared 
to XLFY4H2Ax"* parents, but did not show readily detectable 
unjoined CEs (Fig. 4a, b and Supplementary Fig. 5). 

Recent studies found that H2AX protects unjoined coding ends 
from ATM kinase and CtIP-dependent resection™*. To test whether 
reduced V(D)J recombination in XLF”“H2AX /~ pro-B lines resulted 
from reduced RAG cleavage or reduced joining of RAG cleaved ends 
coupled with end resection, we performed V(D)J joining assays in 
XLFY“H2AX"" and XLFY“H2AX '~ pro-B lines treated with an 
ATM kinase inhibitor (Fig. 4a, b and Supplementary Fig. 5). ATM 
inhibitor treatment of the XLF““H2AX*"" lines reproduced the pheno- 
type of XLF“’“Atm ‘~ lines, including severely reduced CJs and the 
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Figure 3 | ATM and XLF synergize in classical NHEJ during CSR. 

a, Representative flow cytometric analysis of (at least three independent 
experiments for each genotype) purified CD43 splenocytes from the indicated 
mice stained for surface B220 and surface IgG1 after a 4-day stimulation with 
anti-CD40 and IL-4. Additional experiments are in Supplementary Fig. 3. 

b, Summary of IgG1 CSR levels of purified CD43 splenocytes after 4 days of 
anti-CD40 plus IL-4 stimulation. The y-axis shows the average percentage of 
IgG1° cells determined from multiple experiments with cells from wild type 
(n=5), XLF“4 (n=5), Atm” (n= 5) and XLF““Atm ‘~ HL (n= 8) mice. 
Error bars show standard deviations. ***P < 0.001, based on student’s t-test 
between indicated pairs. c, Percentage of direct junctions relative to direct plus 
microhomology (MH)-mediated junctions between St and Sy1 in anti-CD40- 
plus IL-4-stimulated B cells. Junctions were obtained from multiple 
independent experiments with XLF*4 (n = 4), Atm ‘~ (n =3) and 
XLF““Atm ‘~ HL (n = 4) cells. See supplementary Fig. 4 for details. 


accumulation of unjoined CEs (Figs 2f, 4a, b and Supplementary Fig. 5). 
However, the ATM-inhibitor-treated XLFY4H2AX~'~ lines now 
yielded a clear band of unjoined CEs associated with a CE smear below 
the band that is characteristic of aberrant end resection (Fig. 4a, b and 
Supplementary Fig. 5)**. To examine this phenomenon further, we used 
a sensitive TdT end labelling assay”, which revealed unjoined coding 
ends in STI571-treated XLF‘’“H2AX ‘~ pro-B lines without ATM 
inhibitor treatment (Supplementary Fig. 6). These results demonstrate 
that XLF and H2AX have overlapping activities in V(D)J recombina- 
tional joining and also indicate that the unjoined V(D)J CEs in 
XLFY“H2AX '~ pro-B lines are largely resected in the absence of 
H2AX. Because we did not find complete restoration of the unjoined 
coding ends in ATM-kinase-inhibitor-treated XLF”“H2AX '~ pro-B 
lines, as was observed in ATM-inhibitor-treated classical NHEJ- 
deficient cells that also are H2AX deficient”*, XLF might also have an 
overlapping function with H2AX in end protection. 

We consistently observed a lower level of TdT-labelled CEs in 
XLF™ versus XLF““H2AX ‘~ pro-B lines after STI571 induction 
in the presence of ATM inhibitor, even though the latter have substan- 
tially higher levels of unjoined CEs (Fig. 4 and Supplementary Figs 5 
and 6). This finding indicated that unjoined CEs in ATM-inhibitor- 
treated XLF““ pro-B lines could be blocked from TdT activity. We 
used urea denaturing gel electrophoresis to test for a defect in opening 
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Figure 4 | H2AX and XLF have redundant functions. a, b, Southern blot 
analyses of rearrangement of a clonally integrated single-copy pMX-INV 
inversional V(D)J recombination substrate in XLFY“H2AX"* and derivative 
H2AX deleted XLF*/“4H2AX ‘~ lines. a, Clone 42. b, Clone 29. Top: DNA was 
digested with EcoRV and Ncol and probed with C4 probe. Bottom: DNA was 
digested with EcoRV and probed with C4 probe (see Fig. 2e and legend for 
additional details). Analysis of an independent line is presented in 
Supplementary Fig. 5. 


coding-end hairpins and found that ATM-inhibitor-treated XLF“/4 
pro-B lines, like Artemis '~ but not Xrec4-/~ lines, indeed accumu- 
lated unopened hairpin CEs (Supplementary Fig. 7). Given the over- 
lapping functions of the ATM kinase and DNA-PK in CSR”, plus the 
role of DNA-PK in activating Artemis to cleave hairpin coding ends’, it 
seemed possible that dual deficiency for ATM and XLF leads toa DNA- 
PK defect. However, measurements of ionizing-radiation-induced 
phosphorylation of H2AX and KAP-1—both ATM and DNA-PK sub- 
strates*’—in the presence or absence of DNA-PK inhibitors revealed 
DNA-PK kinase activity to be as active in XLF““Atm ‘~ cells as in 
wild-type or Atm ‘~ cells (Supplementary Fig. 8). However, we cannot 
rule out the possibility that XLF““Atm '~ cellshavea specific defect in 
DNA-PK activity in the context of V(D)J recombination joining that 
overlaps with ATM/XLF functions or that a DNA-PK defect in this 
context could represent a more general defect, for example in the 
broader recruitment of classical NHEJ factors. In the latter context, 
we note that the V(D)J recombination defect in XL ' Atm | 
pro-B lines is not just in hairpin opening, as we also observe a severe 
defect in recombination signal sequence joining in these cells (Fig. 2). 

ATM, XLF and H2AX previously have been found to have, at most, 
modest roles in V(D)J recombination and, by extension, classical 
NHEJ*’°. Surprisingly, we now find that dual deficiency for XLF 
and ATM has an impact on V(D)J recombination in progenitor lym- 
phocytes and IgH CSR in mature B cells similarly to deficiency for bona 
fide classical NHEJ factors. We conclude that XLF-deficient cells 
require ATM, and that ATM-deficient cells require XLF, to carry out 
classical NHEJ but not A-EJ. These findings further indicate that XLF- 
deficient cells provide a novel system for elucidating major, previously 
unappreciated roles of ATM and ATM substrates in classical NHEJ 
and vice versa. Our findings already suggest that ATM, via phosphor- 
ylation of its substrates, and XLF share overlapping functions primarily 
in the context of chromosomal classical NHEJ. We have also shown 
that the V(D)J recombination defects in XLF/ATM- and XLF/H2AX- 
deficient pro-B lines are not identical either in extent or in outcome. 
Thus, it remains possible that additional ATM substrates, besides 


Table 1 | H2AX and XLF have redundant functions in murine embryonic development 


Stage H2AX*/* XLFA/A H2AX*/~ XLFA/4 H2AX~/~ XLFA/“ Absorbed Total 
Birth 14 46 0 0 60 
Birth (expected) 15 30 15 - - 
E135 8 28 0 8 44 
E13.5 (expected) 11 22 11 - - 


The indicated genotypes were obtained by inter-crossing H2AX*’~ XLF*’4 mice as described®1*2, H2AX deficiency does not cause embryonic lethality?22”. 
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H2AX, may also overlap functionally with XLF. XLF might directly 
influence the same processes as ATM or H2AX, including end proces- 
sing and end resection, respectively. Alternatively, overlapping func- 
tions might be mediated indirectly through distinct processes. For 
example, XLF may influence reaction kinetics by classical NHEJ factor 
recruitment”°, whereas ATM and ATM substrates seem to tether chromo- 
somal ends for joining”*—two distinct functions that theoretically 
could be redundant with respect to effects on overall joining activities. 


METHODS SUMMARY 

Mice. XLF*/4, Atm*'~, Atm*'©, H2AX*’~ and H2AX*" and ‘HL’ mice have 
been described previously*'**!**. All HL mice were heterozygous for both IgH and 
IgL knockin alleles. 

Chromosomal V(D)J recombination assays. V(D)J recombination with an inte- 
grated substrate was carried out as described’. Briefly, v-abl transformed pro-B- 
cell lines were isolated from various mouse lines that harboured an Ep-Bcl-2 
transgene. For XRCC4-deficient v-abl transformants, the Eu-Bcl-2 transgene 
was introduced after establishment of the line*. The pro-B lines were infected with 
the pMX-INV, pMX-DelCJ or pMX-DelSJ retroviral vector and assayed for V(D)J 
recombination as described**. ATM inhibitor Ku55933 (Cat.No.118500 from 
EMD Biosciences) was used as a final concentration of 15 {1M as described’. 
Lymphocyte development and class switch recombination. Lymphocyte popu- 
lations were analysed by flow cytometry as described’. Isolation and activation of 
splenic B cells and flow cytometric assays were as described’’. Su-Sy1 junctions 
were isolated from day 4 anti-CD40 plus IL-4 stimulated B cells, cloned and 
sequenced as described’. 
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CENP-B preserves genome integrity at replication 
forks paused by retrotransposon LTR 


Mikel Zaratiegui', Matthew W. Vaughn‘, Danielle V. Irvine'y, Derek Goto't, Stephen Watt+, Jurg Bahler?, Benoit Arcangioli® 


& Robert A. Martienssen! 


Centromere-binding protein B (CENP-B) is a widely conserved 
DNA binding factor associated with heterochromatin and centro- 
meric satellite repeats’. In fission yeast, CENP-B homologues have 
been shown to silence long terminal repeat (LTR) retrotransposons 
by recruiting histone deacetylases*. However, CENP-B factors also 
have unexplained roles in DNA replication**. Here we show that a 
molecular function of CENP-B is to promote replication-fork 
progression through the LTR. Mutants have increased genomic 
instability caused by replication-fork blockage that depends on 
the DNA binding factor switch-activating protein 1 (Sap1), which 
is directly recruited by the LTR. The loss of Sap1-dependent barrier 
activity allows the unhindered progression of the replication fork, 
but results in rearrangements deleterious to the retrotransposon. 
We conclude that retrotransposons influence replication polarity 
through recruitment of Sap1 and transposition near replication- 
fork blocks, whereas CENP-B counteracts this activity and pro- 
motes fork stability. Our results may account for the role of LTR 
in fragile sites, and for the association of CENP-B with pericentro- 
meric heterochromatin and tandem satellite repeats. 

In fission yeast, CENP-B proteins are encoded by three homologues, 
autonomously replicating sequence binding protein 1 (abp1), cenp-B 
homologue 1 (cbh1) and cbh2, and were previously characterized as 
DNA binding factors at origins of replication and centromeric repeats, 
respectively*’. Mutants of abp1 grow slowly, whereas double mutants 
with cbh1 or cbh2 have severely stunted growth, abnormal mitosis and 
morphological defects, and triple deletion mutants are inviable®’. As a 
result, double Aabp1Acbh1 mutants form microcolonies on solid 


Aabp1 


Aabp1Acbh1 


media (Fig. la and Supplementary Table 1) and exhibit high levels 
of cell death (Supplementary Fig. 1). We observed the spontaneous 
appearance of faster growing cells in a culture of Aabp1Acbh1 that 
grew at rates similar to the Aabp1 single mutant, lacked morphological 
defects (Fig. la and Supplementary Table 1) and showed lower levels of 
cell death (Supplementary Fig. 1). Genetic analysis revealed the pres- 
ence of a single essential locus that also suppressed the lethality of the 
triple mutant 4abp1Acbh1Acbh2 (not shown). We performed whole- 
genome resequencing in the mutant strain® and isolated a missense 
mutation in the coding sequence of the DNA binding factor Sap1 
(sap1E101D, henceforth called sap1-c; Supplementary Fig. 2) that 
co-segregated with suppression of slow growth in Aabp1Acbh1 and 
resulted in lethality in a wild-type background. Sap] is a protein with 
essential roles in chromosome stability’. Sap1 has been implicated in a 
programmed replication-fork block in the ribosomal DNA (rDNA) 
monomer that ensures directional replication to prevent mitotic 
recombination between rDNA repeats’”"”. 

To test the effects of CENP-B and sap1-c mutations on genome integ- 
rity, we examined chromosomes by pulsed-field gel electrophoresis. 
Although single Aabp1 and Acbh1 mutants had wild-type chromosome 
lengths, the double Aabp1Acbh1 mutant had a smear of DNA fragments 
indicating double-strand breaks in all three chromosomes (Fig. 1b). 
Treatment of the Aabp1Acbh1 sample plugs with the restriction enzyme 
Not] allowed migration of the chromosomes into the gel, and detection of 
telomeric and centromeric sequences (Supplementary Fig. 3), suggesting 
the presence of scattered unresolved replication or recombination inter- 
mediates that interfere with the migration of full-length chromosomes, 


a Figure 1 | DNA damage in CENP-B 
‘ ye mutants is suppressed by sap1 
& se x ae mutation. a, Images of 10° plated 
» pp Aabp14cbhisap1-c cells of wild type (WT), Aabpl, 


Acbh1 and Aabp1Acbh1 with 
Aabp1Acbhisap1-c colonies. 
Microscopy image inserts: branched 
phenotype in Aabp1Acbh1 
background (right) and 
Aabp1AcbhIsap1-c mutant (left). 
Scale bar, 10 ttm. b, Pulsed-field gel 
blot analysis of WT, CENP-B 
mutants (Aabp1, Acbh1, 
Aabp1Acbh1) and five CENP-B/ 
sap1-c mutant isolates 
(Aabp1AcbhI1sap1-c). The position of 
the three chromosomes is indicated 
on the left. The image is a false- 
coloured composite of hybridizations 
for all three chromosomes. 
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but not with NotI-digested DNA, into the pulsed field gel. This indicates 
that Abp1 and Cbh1 have roles in the maintenance of genome integrity. 
Surprisingly, the sap1-c mutation restored genome integrity to all 
chromosomes, with chromosome 3 exhibiting size variability in several 
isolates of Aabp1AcbhIsap1-c mutant (Fig. 1b). In fission yeast, chro- 
mosome 3 harbours the rDNA repeats. Temperature-sensitive alleles of 
sap1 exhibit changes in the size of chromosome 3 attributable to loss of 
fork barrier activity and an increase in mitotic recombination at rDNA”. 
The changes in the size of chromosome 3 size in the sap1-c mutants are 
associated with altered rDNA copy number (Supplementary Fig. 4). The 
temperature-sensitive alleles sap1-1 and sap1-48 (ref. 12) suppressed 
slow growth in the Aabp1Acbh1 double mutant, mimicking sap1-c 
(Supplementary Fig. 5). Consistent with a reduction in fork barrier 
activity, a probe containing a canonical Sap1-binding sequence had 
reduced electrophoretic mobility shift in crude extracts from sap1-c 
mutants (Supplementary Fig. 6). We conclude that the suppression of 
the Acbh1Aabp1 phenotype is not specific to the sap1-c mutation but a 
result of defective function of Sap1, and therefore that the loss of genome 
integrity in Aabp1Acbh1 mutants is a consequence of Sap] activity. 
Blocked replication forks are potential sources of genome instability 
because they can lead to collapse of the replisome and double-strand 
break formation’’. The fact that Sap1 activity leads to DNA damage in 
the absence of Abp1/Cbh1 suggests that the function of CENP-B is to 
manage Sap1-arrested replication forks. In the absence of Sap1, loss of 
replication-fork blockage would render Abp1/Cbh1 activity unnecessary 
and lead to increased genome stability in Aabp1Acbh1 mutants. This 
model predicts that CENP-B and Sap1 would co-localize to the regions 
where they acted on the replication fork, and that these regions would 
engage in homologous recombination and degrade to double-strand 
breaks in the absence of CENP-B. To test this hypothesis, we performed 
chromatin immunoprecipitation of Sap1, Abp1 and Cbh1 followed by 
high-throughput sequencing (ChIP-seq). Abp1 has previously been 
shown to localize and recruit Cbh1 to the LTRs of Tfl and Tf2 retro- 
transposons, where Abp1/Cbh1 play a role in their transcriptional silen- 
cing’. We demonstrated a strong co-localization of Sap1 with Abp1 and 
Cbh1 at these LTRs as well as at solo LTRs scattered throughout the 
genome (Fig. 2a, c and Supplementary Fig. 7a, b) and at the mating type 
locus (Supplementary Fig. 8), where Sap1 and Abp1 have been described 
to regulate mating-type switching’*”*. Both Sap1 and Abp1/Cbh1 also 


localized to genomic regions independently of each other, suggesting 
that they do not form a stable complex or mediate their mutual recruit- 
ment. In particular, Abp1 exhibited binding to transfer RNA (tRNA) 
genes (Fig. 2b and Supplementary Fig. 7b), known to be potent replica- 
tion pause sites'*'°. Abpl and Cbh1 co-localize to a highly A/T-rich 
region located in positions 100-150 of the LTR (Fig. 2c and Sup- 
plementary Fig. 7a, b). The localization of Sap1 within the LTR was 
concentrated in the first 50 base pairs of sequence (Fig. 2c), coinciding 
with a predicted Sap1-binding site'’ (Supplementary Fig. 7a, c). We 
tested this sequence by electrophoretic mobility shift assay and detected 
specific binding in wild-type extracts (Fig. 2d) as well as decreased 
binding and altered mobility in extracts from Aabp1AcbhIsap1-c 
mutants (Supplementary Fig. 7d). Interestingly, solo LTR and full-length 
Tf2 insertions were associated with a prominent peak of Sap1 binding 
located outside the 3’ end of the transposon sequence (Fig. 2c). These 
observations indicate that Sap1 binding precedes and possibly guides Tf 
element integration. To test this prediction, we plotted the average 
enrichment of Sap1, Abp1 and Cbh1 around more than 70,000 de novo 
Tfl integration sites recently reported'*. We observed a dramatic asso- 
ciation of these integration sites with a peak of Sap1 binding immediately 
downstream of the insertion site (Fig. 2e and Supplementary Fig. 8) and 
no appreciable CENP-B enrichment. These results strongly suggest that 
Sap1-binding sequences determine the targeting and orientation of Tf 
retroelement transposition. 

To evaluate the mutual influence of Sap1 and Abp1/Cbh1 on LTR bind- 
ing, we performed ChIP analysis of Sap] in Aabp1Acbh1 mutants and of 
Abp1 in temperature-sensitive sap! mutants that affect DNA-binding 
activity'’*. Sap1 binding to the LTR was unaffected in Aabp1 and Acbh1 
mutants, and was slightly increased in Aabp1Acbh1 double mutants 
(Fig. 2f), but consistently reduced (twofold) in Aabp1AcbhIsapl1-c. 
Conversely, Abp1 binding to the LTR was increased between two 
and three times at the permissive temperature in sap1-1 and sap1-48 
mutants (Fig. 2g). These results indicate that Sap] and Abp1/Cbh1 
bind to the LTR independently of each other and mutually counteract 
their recruitment, and that the sap1-c mutation impairs its binding to 
the LTR in vivo as well as in vitro (Supplementary Fig. 4). 

A failure of replication-fork stability at LTRs, which are distributed 
throughout the genome, would explain the widespread DNA damage 
in Aabp1Acbh1 mutants. We assessed the behaviour of the replication 
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Figure 2 | Sap1 and CENP-B co-localize at the LTR of retrotransposons in 
vivo. Average genome-wide enrichment by ChIP-seq of Sap1, Abp1 and Cbh1 
on (a) all Tf2 elements, (b) euchromatic tRNA and (c) solo LTR. Error bars, 
s.e.m. d, Left panel, competition electrophoretic mobility shift assay; right 
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sap1-c 


panel, inactivation by incubation with anti-Sap1 serum’. e, Average Sap1, Abp1 


and Cbh1 enrichment around Tf1 de novo insertion points’*. f, ChIP of Sap1 


with LTR of Tf2 in CENP-B and sap1-c mutants and (g) of Abp1 with LTR of 


Tf2 in sap1 temperature-sensitive mutants. Error bars, s.d. for triplicates. 
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Figure 3 | CENP-B promotes replication-fork progression through the 
Sap1-dependent barrier present at the LTR and prevents homologous 
recombination. a, Two-dimensional gel electrophoresis of a plasmid fragment 
containing the Tf2 LTR oriented towards (left) and away (right) from the ars1 
origin. Arrows, paused replication intermediates; open arrows, recombination 


fork as it traversed the LTR using two-dimensional agarose gel elec- 
trophoresis. Sap1-dependent programmed fork blocks are directional 
and only hinder fork progression in one orientation’®”’. We cloned a 
full-length LTR and its first 50 base pairs (containing the Sap1-binding 
site) in a plasmid in both orientations with respect to the replication 
origin ars1. Two-dimensional gel electrophoresis in a wild-type strain 
transformed with this episomal system showed a modest accumulation 
of fork signal at the location of the cloned LTR (Fig. 3a), but only when 
the Sap1-binding site was proximal to the origin, and not in the opposite 
orientation (Supplementary Fig. 9). The Sap1-binding site was suf- 
ficient for this blocking activity, with the same orientation requirement 
(Supplementary Fig. 9). We next assayed the LTR for pausing activity in 
Aabp1, Acbh1 and sap1-c mutants (Fig. 3a). Strikingly, the paused fork 
signal was consistently enhanced and always at the same location in 
Aabp1 and Acbh1 mutants, whereas the Aabp1Acbh1 double mutant 
exhibited additional signals outside the replication arc, suggestive of 
recombination intermediates’’. The fork-blocking activity of the LTR 
disappeared in Aabp1AcbhIsap1-c mutants. Unresolved fork blocks 
can collapse and undergo homologous recombination for fork recovery. 
We confirmed the presence of homologous recombination in the 
Aabp1Acbh1 double mutants by measuring the increase in the forma- 
tion of Rad22 (homologous to Rad52 in Saccharomyces cerevisiae) foci 
in a Rad22-yellow fluorescent protein (YFP) strain*® (Fig. 3b). We 
observed that Aabp1Acbh1 double-mutant cells accumulated the 
homologous recombination protein Rad22 at the LTR (Fig. 3c). 
Consistently, the recombination factor Rhp51 (Rad51 homologue) 
was essential for viability of Aabp1Acbh1 double mutants (Supplemen- 
tary Fig. 10), indicating that homologous recombination is necessary for 
recovery from fork stalling at LTRs. These results indicate that Abp1/ 
Cbh1 counteract Sap] barrier activity and stabilize the replication fork 
at LTRs. This results in loss of genome integrity and homologous 
recombination at the LTR in Aabp1Acbh1 mutants. 

The Sap1-binding sequence is conserved in Tfl and Tf2 retrotranspo- 
son LTRs (Supplementary Fig. 7c), suggesting that it plays a role in the 
retrotransposon life cycle. We assayed the effect of sap1 and abp1/cbh1 
on Tf2 stability by measuring the frequencies of loss of a ura4 reporter 
transgene inserted in the Tf2-6 transposon”’. Mutation of abp1 resulted 
in a dramatic decrease of Tf2 ectopic recombination, which returned to 
normal levels when sap1 was also mutated (Fig. 4). In the presence of 
sap1+ there is a preference for gene conversion, which normally con- 
stitutes most ectopic recombination events”; however, in Aabp1sap1-c 
and Acbhlsap1-c mutants the proportion of eviction and conversion 
events is similar (Fig. 4). Therefore we propose that the LTR recruits 
Sap1 to control the direction of transposon replication and increase 
transposon persistence in the genome, perhaps by coordinating lagging 
strand synthesis, which prevents single-strand annealing from com- 
plementary direct repeats (Supplementary Fig. 11a, b). CENP-B counter- 
acts this activity, possibly by promoting replication-fork progression 


intermediates. The percentage of signal over the LTR is indicated below each 
panel. b, Quantification of Rad22-YFP foci (m > 400 nuclei for all mutants). 
Error bars, s.e.m. ¢, Rad22-YFP ChIP with LTR in wild type, dabp1 and 
Aabp1Acbh1 mutants. Error bars, s.d. for triplicates. 


through the Sap1-dependent barrier. Thus CENP-B and Sap1 promote 
genome and transposon integrity, respectively, in a ‘tug-of-war’ between 
transposon and host. Abp1 stimulates fork progression by recruiting the 
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Figure 4| CENP-B and Sap1 have opposite effects on Tf2 stability. 
a, Ectopic recombination fluctuation assay. Two potential mechanisms of ura4 
loss from the marked Tf2-6::ura4 are indicated: gene conversion and eviction by 
LTR recombination. Columns represent total median ura4 loss frequency in 
wild type, dabp1, Acbh1, Aabplsap1-c and Acbh1sap1-c mutants; error bars, 
95% confidence intervals. Tints indicate distribution of mode of ectopic 
recombination events in the ura4 colonies obtained from wild type (n = 93), 
Aabp1 (n= 88), Acbh1 (n = 94), AabpIsap1-c (n = 91) and AcbhIsap1-c 
(n = 89) mutants. b, Model for the interactions between Abp1, Cbh1, Sap1 and 
the replication fork at the LTR. 
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fork-restart protein MCM10 (ref. 4), which has primase activity. 
Additionally the histone deacetylase Mst1, which has roles in replica- 
tion-fork stability interacts directly with Cbh1 (ref. 23). In S. cerevisiae the 
histone deacetylase Sir2 silences and inhibits recombination in repetitive 
DNA”. CENP-B factors recruit the histone deacetylases Clr3 and Clr6, 
which perform LTR silencing’. The result of these functions would be to 
preserve genome integrity at LTRs by preventing DNA damage and 
recombination. This novel role of CENP-B may not be limited to LTR 
and tDNA, as mutation of the replication-fork blocking factor reb1, 
which is specific to rDNA repeats, also suppresses the slow growth of 
the abp1 mutant”. Similarly, our ChIP-seq data indicate that Sap] may 
also be implicated in the functionality of the replication terminator RTS1 
(Supplementary Fig. 8) in collaboration with Rtfl. In this manner, the 
function and regulation of the Sap1-bound regions is determined by the 
binding in their vicinity of different factors affecting replication-fork 
progression. 

Because of their repetitive nature, transposons have a close relation- 
ship with replication and recombination. For example, the IS608 trans- 
poson of Escherichia coli is targeted to the lagging strand and always 
replicated in the same direction”’. This might prevent recombination 
between tandemly arranged copies. We have shown that retrotranspo- 
sons influence DNA replication through recruitment of directional 
fork blocking factor Sap1 and that activity of CENP-B is required 
for replication-fork management. Additionally, retrotransposition is 
targeted to the genomic localization of Sap1. These mechanisms influ- 
ence the replicative dynamics of the host genome. The genomes of 
eukaryotes show widespread colonization by retrotransposons, and 
pericentromeric satellite repeats are often of transposon origin’’. 
When such sequences are arranged as tandem repeats, control of 
replication direction by CENP-B would prevent chromosome breaks 
and preserve genome integrity. This mechanism accounts for the role 
of other regulators of fork progression in inter-LTR recombination**”. 
In contrast, when flanked by LTR in opposite orientations, fragile sites 
fail to replicate and result in chromosome breaks!*””. 


METHODS SUMMARY 


ChIP was performed using tagged TAP-Abp1 and TAP-Cbh1 strains with an 
Anti-Calmodulin Binding Protein antibody (Millipore) and a polyclonal serum 
against the native Sap1 protein’. High-throughput sequencing was performed on 
an Illumina G2 genome analyser, and analysed for polymorphism detection or 
statistical analysis of enrichment. Two-dimensional gel electrophoresis was per- 
formed as described''; see Supplementary Information for construction of the 
episomal system. Electrophoretic mobility shift assay was performed as described 
previously’”. 
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Taxadiene synthase structure and evolution of 
modular architecture in terpene biosynthesis 


Mustafa Koksal', Yinghua Jin?*, Robert M. Coates”, Rodney Croteau* & David W. Christianson! 


With more than 55,000 members identified so far in all forms of life, 
the family of terpene or terpenoid natural products represents the 
epitome of molecular biodiversity. A well-known and important 
member of this family is the polycyclic diterpenoid Taxol (paclitaxel), 
which promotes tubulin polymerization’ and shows remarkable effi- 
cacy in cancer chemotherapy’. The first committed step of Taxol 
biosynthesis in the Pacific yew (Taxus brevifolia)’ is the cyclization 
of the linear isoprenoid substrate geranylgeranyl diphosphate 
(GGPP) to form taxa-4(5),11(12)diene*, which is catalysed by taxa- 
diene synthase’. The full-length form of this diterpene cyclase con- 
tains 862 residues, but a roughly 80-residue amino-terminal transit 
sequence is cleaved on maturation in plastids®. We now report the 
X-ray crystal structure of a truncation variant lacking the transit 
sequence and an additional 27 residues at the N terminus, hereafter 
designated TXS. Specifically, we have determined structures of TXS 
complexed with 13-aza-13,14-dihydrocopalyl diphosphate (1.82 A 
resolution) and 2-fluorogeranylgeranyl diphosphate (2.25 A resolu- 
tion). The TXS structure reveals a modular assembly of three a- 
helical domains. The carboxy-terminal catalytic domain is a class I 
terpenoid cyclase, which binds and activates substrate GGPP with a 
three-metal ion cluster. The N-terminal domain and a third ‘inser- 
tion’ domain together adopt the fold of a vestigial class II terpenoid 
cyclase. A class II cyclase activates the isoprenoid substrate by proto- 
nation instead of ionization, and the TXS structure reveals a defin- 
itive connection between the two distinct cyclase classes in the 
evolution of terpenoid biosynthesis. 

Although the first structures of C;) monoterpene’, C;5 sesquiterpene*” 
and Cp triterpene’® cyclases appeared several years ago, the structure of 
the ‘missing link’ in this series—a Cy9 diterpene cyclase—has been 
unknown until now. Plant diterpene cyclases such as taxadiene synthase 
are perhaps the most intriguing because they are the largest terpenoid 
cyclases (800-900 residues) and they are believed to be the most closely 
related to the ancestral plant terpenoid synthase’’”’. Triple-domain plant 
diterpene synthases are believed to have evolved through the fusion of 
single-domain and double-domain bacterial diterpene cyclases, which in 
turn evolved from ancient progenitors’. 

The two distinct classes of terpenoid cyclase have unrelated protein 
folds and use different substrate activation mechanisms'*””. A class I 
terpenoid cyclase uses a trinuclear metal cluster liganded by conserved 
motifs DDXXD and (N,D)DXX(S,T)XXXE (bold indicates typical 
metal ligands) to trigger the ionization of the isoprenoid substrate 
diphosphate group, which generates a carbocation to initiate catalysis. 
A class II terpenoid cyclase initiates carbocation formation by general 
acid catalysis, using the ‘middle’ aspartic acid in a DXDD motif to 
protonate an isoprenoid double bond or oxirane moiety. Taxadiene 
synthase lacks a DXDD motif but contains conserved metal-binding 
motifs and requires Mg” for optimal catalytic activity’, indicating that 
it functions as a class I terpenoid cyclase. 

Expression and analysis of N-terminal truncation variants of 
taxadiene synthase revealed that deletions of 60 or 79 residues yield 


catalytically active proteins, whereas deletions of 93, 113 or 126 
residues yield catalytically inactive proteins'*. These results implicate 
the N-terminal segment D*°*DIPRLSANYHGDL”” in catalysis. The 
N-terminal truncation variant lacking 60 residues has been studied 
with deuterated'*! and fluorinated” analogues of GGPP. These studies 
suggest a cyclization mechanism (Fig. 1) in which the diphosphate 
leaving group, the 14,15 m bond, and the 10,11 = bond of GGPP are 
optimally aligned for leaving-group departure with the formation of a 
verticillen-12-yl carbocation intermediate in the first step(s) of cata- 
lysis. Conformational inversion followed by 110,7a-proton transfer 
and transannular B/C ring closure subsequently generates the taxen- 
4-yl carbocation, whose deprotonation yields taxa-4(5),11(12)-diene. 
The intramolecular proton transfer required to initiate transannular 
B/C ring closure occurs without the assistance of an enzyme-bound 
base*®. The base mediating the final deprotonation step has not yet 
been identified. 

The successful crystallization of TXS required co-crystallization with 
Mg?" and either 13-aza-13,14-dihydrocopalyl diphosphate (ACP) or 
2-fluorogeranylgeranyl diphosphate (FGP) (molecular structures are 
shown in Supplementary Fig. 1). Although not active, this truncation 
variant is exceptionally stable and is the only form examined that 
yielded satisfactory crystals. Surprisingly, TXS contains three o-helical 
domains and harbours the folds of both class I and class II terpenoid 
cyclases (Fig. 2); this structure is representative of nearly all diterpene 
cyclases. The C-terminal domain ($553-V862) has the class I terpenoid 
synthase fold that was first observed in farnesyl diphosphate synthase” 
and subsequently observed and designated the classI terpenoid 
synthase fold*’* in monoterpene and sesquiterpene cyclases’°. This 
fold is also observed in geranylgeranyl diphosphate synthase, which 
generates the substrate for diterpene cyclases. The N-terminal domain 
of TXS (M107-1135 and $349-Q552) together with the ‘insertion’ 
domain” (S136-Y348) comprise the double «-barrel class II terpenoid 
synthase fold that was first observed in the triterpene cyclase squalene- 
hopene cyclase’ and later observed in oxidosqualene cyclase**. TXS 
shares no significant overall amino-acid sequence identity with these 
triterpene cyclases. 

Comparison of TXS with other terpenoid cyclases reveals that 
cyclase architecture is modular in nature and can consist of one, two 
or three domains (Fig. 2). Bacterial and fungal sesquiterpene cyclases 
are single-domain enzymes that adopt the classI terpenoid synthase 
fold; the first such enzymes to yield crystal structures were pentalenene 
synthase® and trichodiene synthase”, respectively. Plant monoterpene 
and sesquiterpene cyclases generally contain two domains: the 
C-terminal domain adopts the classI terpenoid synthase fold, and 
the N-terminal domain adopts an unrelated o-helical fold that, as first 
noted by Wendt and Schulz”, is homologous to the N-terminal domain 
of the class II triterpene cyclase squalene-hopene cyclase’. The first 
plant monoterpene and sesquiterpene synthases to yield crystal struc- 
tures were bornyl diphosphate synthase’ and 5-epi-aristolochene 
synthase’, respectively. Most plant diterpene synthases contain three 
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Taxen-4-yl cation 


Figure 1 | Proposed catalytic mechanism of taxadiene synthase. The 
cyclization of GGPP to form taxadiene is the first committed step of Taxol 
(paclitaxel) biosynthesis in yew species. OPP, diphosphate; Ph, phenyl; Ac, 


domains, the third being an insertion conserved in sequence and posi- 
tion’®. It was correctly predicted that this domain is homologous to the 
insertion domain of a triterpene cyclase on the basis of bioinformatics 
analysis’’. 

It is interesting to note that the class II triterpene cyclases squalene- 
hopene cyclase’® and oxidosqualene cyclase** are monotopic mem- 
brane proteins: each penetrates, but does not completely pass through, 
the membranes in which they are localized. Their triterpene substrates 


Taxadiene synthase 
(Taxus brevifolia) 


Squalene-hopene cyclase 
(Alicyclobacillus acidocaldarius) 


Figure 2 | Structural relationships among terpenoid cyclases. The class I 
terpenoid cyclase fold of pentalenene synthase® (PDB accession code 1PS1) 
(blue; “ domain’"*) contains metal-binding motifs DDXXD and 
(N,D)DXX(S,T)XXXE (red and orange, respectively); in 5-epi-aristolochene 
synthase’ (PDB accession code 1LZ9) this domain is linked to a smaller vestigial 
domain (green; ‘® domain’!’). A related domain is found in the class II 
terpenoid cyclase fold of squalene-hopene cyclase’® (PDB accession code 
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Verticillen-12-yl cation 
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11370. 
oe 


H+ transfer 


Verticillen-8-yl cation 


Taxol (paclitaxel) 


acetyl; Bz, benzyl. Taxadiene is converted to Taxol through a lengthy series of 
oxidation and acylation steps. 


(squalene and squalene oxide, respectively) are solubilized in the mem- 
brane and enter the active-site cavity through a hydrophobic channel 
open to the membrane surface. A nonpolar ‘plateau’ flanks the 
entrance to this channel near helix 8 in their respective insertion 
domains; helix 8 is quite hydrophobic in nature and probably serves 
as the membrane anchor (Fig. 2). In contrast, TXS functions in the 
plastid lumen, so its insertion domain does not contain the corres- 
ponding hydrophobic components. 


Pentalenene synthase 
(Streptomyces sp. UC5319) 


5-epi-aristolochene synthase 
(Nicotiana tabacum) 


1SQC), where it contains the general acid motif DXDD (brown) and a second 
domain (yellow; “y domain’’’) inserted between the first (purple) and second 
helices; a hydrophobic plateau flanking helix 8 (grey stripes) enables membrane 
insertion. Taxadiene synthase (PDB accession code 3P5R) contains both class I 
and class II terpenoid cyclase folds, but only the class I domain is catalytically 
active. The role of N termini (purple) in class I plant cyclases is to ‘cap’ the active 
site, as shown for 5-epi-aristolochene synthase. 
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Figure 3 | Binding of substrate analogue to TXS. a, Simulated annealing 
|F,| — |F| omit map in which FGP and three Mg** ions are omitted from the 
structure factor calculation (contoured at 3.00); the side chains of metal ligands 
are indicated. b, Molecular recognition of the substrate diphosphate group in the 
TXS active site. For clarity, the isoprenoid moiety of FGP is truncated to one 


The active site of TXS is located in the C-terminal domain and is the 
exclusive binding site of the substrate analogue FGP (Fig. 3a and 
Supplementary Fig. 3a) and the bicyclic isoprenoid ACP (Supplemen- 
tary Fig. 2; ACP does not mimic any intermediates in the TXS reaction, 
although it does mimic a common intermediate of many other diterpene 
cyclases). Metal-binding motifs that signal class I terpenoid cyclase func- 
tion'>”’ are conserved in TXS as D°'7DMAD and N’*’DTKTYQAE. 
The Mg?* , and Mg**c ions are coordinated by D613 and D617, and 
the Mg*" ion is chelated by N757, T761 and E765 (Fig. 3b and Sup- 
plementary Fig. 3b). Along with the recent observation of a trinuclear 
metal cluster in the active site of isoprene synthase”, the structure of the 
TXS-Mg’*3-FGP complex indicates that three-metal ion catalysis is 
conserved across the greater family of classI terpenoid synthases: C; 
hemiterpene, Cjo monoterpene, Cj); sesquiterpene and Cy9 diterpene 
synthases. 

In addition to metal coordination interactions, the diphosphate 
group of FGP also accepts hydrogen bonds from R754 and N757 
(the latter residue also coordinates to Mg” *,) and makes water- 
mediated hydrogen bonds with Y688, E691, Y835, S713, R768 and 
Q770. It is interesting to compare the molecular recognition of the 
FGP diphosphate group with that of the product diphosphate group in 


a b 
1,000, 
SHC 
800 
> 
< 600) TXS= eee 
o 
—€ a 
3 400 a 
S 5EAS 
> TDs" Pie 
aby 23 
cad BPPSs | ¢ = Cavity 
ISPSm MNSE . = Substrate 
s = Product 


0 5 10 1 20 25 30 35 
Terpenoid carbons 


Figure 4 | Active-site cavities of terpenoid synthases. a, Active-site volumes 
are generally slightly larger than corresponding substrate and product volumes, 
perhaps to accommodate structural changes better during the cyclization 

cascade. Abbreviations are defined in Supplementary Table 2. b, Superposition 
of TXS (blue) and bornyl diphosphate synthase (green) guides the modelling of 
the J-K loop and the N-terminal segment of TXS (red) to define the enclosed 
active-site cavity (the magenta meshwork indicates the solvent-accessible 
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carbon (grey). Metal coordination and hydrogen bond interactions are indicated 
by thin solid and dashed lines, respectively. Atoms are colour-coded as follows: 
yellow, carbon; blue, nitrogen; red, oxygen; orange, phosphorus. Mg? * ions (A,B 
and C) and water molecules are shown as purple and red spheres, respectively. A 
corresponding stereo figure is shown in Supplementary Fig. 3. 


the plant monoterpene cyclase bornyl diphosphate synthase’ 
(Supplementary Fig. 3c). Most residues that assist the trinuclear metal 
cluster in binding and activating the substrate diphosphate group are 
conserved between these cyclases. 

Class I terpenoid synthases undergo a significant structural transi- 
tion from an open to a closed active-site conformation after the bind- 
ing of three Mg”* ions and the substrate diphosphate group, and this 
conformational transition helps to protect reactive carbocation inter- 
mediates from premature quenching by bulk solvent'*"®. Although the 
structure of the fully open conformation of TXS is unavailable, we 
suggest that the structural changes observed between open and closed 
active-site conformations in plant monoterpene and sesquiterpene 
cyclases are representative of those that occur in the plant diterpene 
cyclase TXS. For example, active-site closure in bornyl diphosphate 
synthase’ and 5-epi-aristolochene synthase” involves conformational 
changes of loops flanking the mouth of the active site; in addition, the 
N-terminal polypeptide ‘caps’ each active site. Specifically, the 
N-terminal polypeptide binds in a groove defined by the A-C and 
D-D1 loops on one side, and the J-K and H-H-«1 loops on the other. 
Tandem arginine residues in the N terminus of bornyl diphosphate 
synthase make key hydrogen-bond interactions in this groove. The N 


surface). c, One orientation of taxadiene (blue) fits in the active-site cavity such 
that the H5B atom of the preceding taxen-4-yl cation would be oriented 
towards the diphosphate leaving group, suggesting that the PP; anion could 
serve as the stereospecific base that terminates the cyclization cascade. Three 
Mg?" ions (A, B and C) and FGP are shown for reference; all protein atoms are 
omitted for clarity. A corresponding stereo figure is shown in Supplementary 
Fig. 4. 
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terminus of 5-epi-aristolochene synthase contains only a single cor- 
responding arginine residue, R15, that seems to serve a similar func- 
tion in the structure of the closed active-site conformation’. By analogy 
with the structures of these plant monoterpene and sesquiterpene 
cyclases, R84 in the missing N-terminal segment of TXS may help to 
stabilize the fully closed, catalytically active conformation of mature 
taxadiene synthase. Accordingly, the closed conformations of the J-K 
loop and the N-terminal segment of TXS are readily modelled on the 
basis of the bornyl diphosphate synthase structure to approximate the 
enclosed active-site contour that serves as the template for GGPP 
cyclization (Fig. 4 and Supplementary Fig. 4). 

The active-site contour of TXS encloses a larger volume than the 
active sites of monoterpene or sesquiterpene cyclases, which is consist- 
ent with the larger isoprenoid substrate of the diterpene cyclase. The 
active-site cavity volumes of terpenoid synthases correlate with the 
hydrocarbon volume of their respective isoprenoid substrates (Fig. 4a 
and Supplementary Table 2). It has been suggested"® that the shape of 
the active-site contour is more product-like for high-fidelity cyclases— 
that is, those that generate a single cyclization product—whereas if the 
active-site contour is less product-like, a more promiscuous cyclase 
results that generates multiple cyclization products. For TXS, the 
active-site volume is significantly larger than the volume of the product 
taxadiene. This is consistent with the observation that TXS is a some- 
what promiscuous cyclase, generating about 20% of the alternative 
isomer taxa-4(20),11(12)-diene*’. Indeed, the fact that TXS binds the 
bicyclic diterpene analogue ACP (Supplementary Fig. 2), which does 
not correspond to any intermediate in the TXS mechanism, clearly 
demonstrates promiscuity in ligand binding. 

Taxadiene can fit in the enclosed active-site contour of TXS with two 
alternative orientations (Supplementary Fig. 4). Each orientation leads 
to possible suggestions for active-site bases that could function in the 
final deprotonation step of the cyclization cascade. Polar groups in the 
active site include $587, Q609, Y684, Y688, C719 and C830. Although 
one of these residues, for example Y688, could conceivably function as 
a base, taxadiene can also fit within the active-site contour such that 
H5B of the preceding taxen-4-yl carbocation would be oriented 
towards the inorganic pyrophosphate (PP;) product (Fig. 4c and Sup- 
plementary Fig. 4). Thus, the PP; anion could serve as a stereospecific 
base, suggesting the possibility for substrate-assisted or product- 
assisted catalysis. 

Finally, although the N-terminal domain and the insertion domain 
of TXS form a double «-barrel class II terpenoid synthase fold such as 
that characterizing the triterpene cyclases'®’®, the characteristic 
general acid DXDD motif and an active-site cavity are absent. 
Nevertheless, the TXS structure illuminates structure-function rela- 
tionships in other diterpene cyclases that contain catalytically active 
classII cyclase domains. For example, consider the bifunctional 
diterpene cyclase abietadiene synthase from the grand fir tree (Abies 
grandis). Here, the class II terpenoid cyclase domain first catalyses the 
protonation-dependent cyclization of GGPP to form (+)-copalyl 
diphosphate, and the class I terpenoid cyclase domain then catalyses 
the ionization-dependent cyclization of (+)-copalyl diphosphate to 
form abietadiene”’. Because the structures of abietadiene synthase and 
TXS are expected to be homologous, on the basis of 44% amino- 
acid sequence identity, the protonation-dependent reaction in the 
class II cyclase domain is presumably catalysed in much the same 
manner as for a triterpene cyclase reaction. In other diterpene cyclases 
such as copalyl diphosphate synthase from Arabidopsis thaliana 
(related to TXS by 31% amino-acid sequence identity), only the 
class II terpenoid cyclase domain is catalytically active; the class I terpe- 
noid cyclase domain is vestigial and the signature metal-binding motifs 
are absent”’. Thus, biosynthetic diversity in the family of terpenoid 
natural products is rooted in a ‘mix and match’ evolutionary strategy 
with classI and classII terpenoid cyclase folds, which can evolve 
together or separately as needed to generate the terpenoid product(s) 
required by the organism. 
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METHODS SUMMARY 


A variety of different taxadiene synthase constructs were prepared, purified and 
assessed in crystallization trials, but only one proved satisfactory for crystalliza- 
tion. This construct, designated TXS, was one in which 107 residues were deleted 
from the N terminus and a hexahistidine tag was added to the C terminus to 
facilitate purification. TXS was expressed in Escherichia coli BL21 (DE3) RIL cells, 
purified, and co-crystallized with ACP or FGP by the sitting-drop vapour- 
diffusion method. The initial electron density map of the TXS-ACP complex 
was phased by using single-wavelength anomalous dispersion. After map fitting, 
refinement converged smoothly to R/Rgee = 0.167/0.205. The structure of the 
TXS-FGP complex was solved by molecular replacement and refined to 
R/Réree = 0.187/0.250. Data collection and refinement statistics are shown in 
Supplementary Table 1. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Cloning, expression and purification of taxadiene synthase. Heterologous 
expression of taxadiene synthase from Taxus brevifolia lacking the N-terminal 
segment M1-V79 (M79-TXS) in Escherichia coli was achieved at the University of 
Pennsylvania, using procedures described previously’’. We found that this protein 
consistently underwent degradation at 20 °C and 4 °C over a period of a few days 
to generate a soluble polypeptide stable for at least 4 weeks. Edman sequencing 
(Wistar Institute Proteomics Facility) showed that this polypeptide lacked the first 
29 residues. Given its exceptional stability, this truncated polypeptide was con- 
sidered a good candidate for crystallization. Accordingly, the M79-TXS gene 
segment corresponding to an N-terminal truncation at R107 (M107-TXS) was 
amplified by PCR with the following forward and reverse primers with flanking 
Ndel and BamHI sites, respectively: 5’-GCACATATGGAGAGTTCTACT 
TACCAAGAAC-3’ and 5'’-GCAGGATCCTACTTGAATTGGATCAATATAA 
AC-3'. A variant of the pET22b vector (pET22bTV; Novagen) was created by 
PCR with the following forward and reverse primers with complementary flanking 
restriction sites: 5’-GCAGGATCCCACCACCACCACCACC-3’ and 5'-GCACA 
TATGTATATCTCCTTCTTAAAGTTAAAC-3’. The gene encoding M107-TXS 
and the pET22bTV vector were ligated to generate a plasmid encoding the M107- 
TXS polypeptide with a C-terminal hexahistidine tag (M107-TXS-CHT), which 
was then used to transform E. coli XL1Blue cells for amplification. The resulting 
clones were confirmed by DNA sequencing (University of Pennsylvania School of 
Medicine Sequencing Facility) to have only two silent mutations and no amino-acid 
substitutions. 

The M107-TXS-CHT protein (henceforth designated “TXS’) was expressed in 

E. coli BL21 (DE3) RIL cells. Transformed cell cultures were grown in 2-I flasks 
containing 1] of Luria-Bertani medium with 100 mg of ampicillin at 37 °C. At an 
attenuance (Dgoo) of 0.6-0.7, cultures were equilibrated at 20 °C and expression 
was induced by 0.25 mM isopropyl-1-thio-B-p-galactopyranoside for 16h. Cells 
were harvested by centrifugation at 6,000g for 10 min, producing about 9 g of pellet 
per litre of culture. The pellet was suspended in 20 ml of buffer E (50 mM K,HPO, 
pH7.5, 300 mM NaCl, 10% (v/v) glycerol, 3 mM 2-mercaptoethanol) containing 
1 mg ml ' lysozyme and 1 mM phenylmethylsulfonyl fluoride, then incubated at 
4°C for 2h with shaking. Cells were disrupted by sonication on ice six times (30s 
on and 90s off) with a large probe at medium power. Cell debris was cleared by 
centrifugation twice at 30,000g for 1 h. The clear supernatant was applied to a pre- 
equilibrated Talon column (Clontech Laboratories) at a flow rate of 1 ml min ! 
with an AKTAprime plus fast performance liquid chromatography system (GE 
Healthcare Bio-Sciences AB). The loaded column was washed three times with 
5 column volumes of buffer E, then buffer E plus 5 mM imidazole, then buffer E 
plus 10 mM imidazole. TXS was eluted with a gradient of 10-200 mM imidazole in 
buffer E at a flow rate of 2.5ml min !. Selected fractions were combined, con- 
centrated to a volume of 5 ml, and applied to a Superdex 200 preparatory-grade 26/ 
60 size-exclusion column (GE Healthcare Bio-Sciences AB) with buffer A (25 mM 
3-(N-morpholino)-2-hydroxypropanesulfonic acid (MOPSO) pH6.8, 10% (v/v) 
glycerol, 1mM dithiothreitol (DTT)) containing 300 mM NaCl. Fractions from 
this run were combined, concentrated to a volume of 5 ml and applied to the same 
column a second time with the same buffer. Fractions from the final size-exclusion 
column were combined and concentrated to 8.6mgml '. The purity of the TXS 
sample was 99% by SDS-PAGE analysis. No hexane-extractable products were 
identified by gas chromatography-mass spectrometry analysis after incubation 
with GGPP, indicating that this construct did not generate measurable amounts of 
taxadiene. 
Crystallization. TXS could not be crystallized in the absence of isoprenoid dipho- 
sphate ligands. However, excellent crystals resulted when the protein was crystallized 
in the presence of ACP or FGP and Mg?" ions by the sitting-drop vapour-diffusion 
method at 4°C (ligand synthesis is outlined in Supplementary Information). To 
obtain the crystals of the TXS-ACP complex, a 1-1l drop of protein solution (5 mg 
ml 'TXS,25mM MOPSO pH6.8, 10% glycerol, 1 mM DTT, 2.5 mM ACP, 2.5 mM 
MgCl) was added to a 1-11 drop of precipitant solution (100 mM Bis-Tris pH 6.5, 
25% polyethylene glycol 3350, 200 mM NaC]) and equilibrated against a 250-1 well 
reservoir of precipitant solution. Prism-like crystals with rounded edges appeared 
within 2-3 days and grew to maximal dimensions of 50 jum X 100 jum X 200 jum in 
2-3 weeks. These crystals were flash-cooled after transfer to a cryoprotectant solu- 
tion consisting of the mother liquor augmented with 15% ethylene glycol. For the 
preparation of a heavy-atom derivative for phasing, crystals of the TXS-ACP com- 
plex were soaked in a cryoprotectant solution (100 mM HEPES pH 7.5, 25% poly- 
ethylene glycol 3350, 100mM NaCl, 100mM MgCl, 10% glycerol) containing 
2mM methylmercury chloride for 22 h at 15 °C before flash-cooling. 

To obtain crystals of the TXS-FGP complex, a 1-1 drop of protein solution 
(5mg ml — ''TXs, 25 mM MOPSO pH 6.8, 10% glycerol, 1 mM DTT, 2.5 mM FGP, 
2.5 mM MgCl] was added to a 1-1 drop of precipitant solution (100 mM HEPES 
pH7.0, 20% polyethylene glycol 3350, 200 mM MgCl,) and equilibrated against a 


250-ul well reservoir of precipitant solution. These crystals were flash-cooled after 
transfer to a cryoprotectant solution consisting of the mother liquor augmented 
with 10% glycerol. 

Collection and processing of X-ray diffraction data. Crystals of the TXS-ACP 
and TXS-FGP complexes diffracted X-rays to 1.82A and 2.25A resolution, 
respectively, at the National Synchrotron Light Source (NSLS), Brookhaven 
National Laboratory, beamline X-29, using incident radiation with 1 = 0.945 A 
and 1.008 A, respectively. Crystals of the mercury-derivatized TXS-ACP complex 
diffracted X-rays to 2.6 A resolution at NSLS beamline X-25 using incident radi- 
ation with 2 = 1.000 A. All diffraction data were processed with HKL2000 (ref. 
31). Crystals of the TXS-ACP complex belonged to space group P2;2;2; with unit 
cell parameters a = 55.46 A, b=72.A41 A, c= 206.93 A, with one molecule in the 
asymmetric unit; the Matthews coefficient Viy was 2.35 A? Da! (solvent content 
48%). Crystals of the TXS-FGP complex belonged to space group P2, with unit cell 
parameters a = 54.05 A, b= 201.98 A, c=81.43A, B =91.60°, with two mol- 
ecules in the asymmetric unit; Vj; = 2.61 A? Da‘! (solvent content 53%). Data 
collection and reduction statistics are shown in Supplementary Table 1. 
Phasing and structure refinement. The initial electron density map of the TXS- 
ACP complex was phased by single-wavelength anomalous dispersion (SAD) with 
the 2.6-A resolution data collected from the methylmercury chloride derivative. 
Initially, six Hg** atoms were located by using the program HKL2MAP” and used 
for SAD phasing, search and refinement of an additional seven Hg** sites; density 
modification, initial electron density map calculation and automatic model build- 
ing were performed with the AUTOSOL routine implemented in PHENIX”. This 
procedure built more than 50% of the protein residues into the initial electron 
density map, most of which were a-helices. Manual model building subsequently 
generated an initial model with 90% of the residues registered in the sequence. This 
model was used for molecular replacement calculations with the AUTOMR routine 
implemented in PHENIX with the 1.82-A resolution data collected from the TXS- 
ACP complex. Initial rigid-body refinement, iterative cycles of positional refine- 
ment, and grouped and individual atomic B-factor refinement were performed with 
PHENIX. Manual model rebuilding was performed with COOT*. Water mol- 
ecules, Mg? * ions and the ACP molecule were included in later cycles of refinement. 
A total of 745 out of 764 residues are present in the final model of the TXS-ACP 
complex; disordered segments excluded from the final model include N-terminal 
residues M107-S110 (M107 is the N terminus of the construct), the C-terminal 
hexahistidine tag and its associated linker residues (G863-H870) and surface loop 
1838-A844. An electron density map of the TXS-ACP complex is shown in 
Supplementary Fig. 2. 

The model of the TXS-ACP complex without its ligand and solvent atoms was 
used as a search probe for molecular replacement calculations to solve the struc- 
ture of the TXS-FGP complex at 2.25 A resolution. Rigid-body refinement, posi- 
tional refinement and grouped and individual atomic B-factor refinement were 
performed with PHENIX. Manual model rebuilding was performed with COOT. 
In the final model of the TXS-FGP complex, 746 and 736 out of 764 residues were 
present in monomers A and B, respectively. Disordered segments excluded from 
the final models of monomers A and B included N-terminal residues M107-S110, 
the C-terminal hexahistidine tag and its associated linker residues (G863-H870) 
and loop 1574-R578; in addition, surface loop F837—E846 was disordered in 
monomer B. 

For both structures, data reduction and refinement statistics are shown in Sup- 
plementary Table 1. Ramachandran plot statistics, calculated with PROCHECK”*, 
were as follows. TXS-ACP complex: allowed, 93.7%; additionally allowed, 5.9%; 
generously allowed, 0.3%; disallowed, 0.1%. TXS-FGP complex: allowed, 91.3%; 
additionally allowed, 8.1%; generously allowed, 0.4%; disallowed, 0.1%. Simulated- 
annealing omit maps were calculated with CNS”. Protein structure figures were 
prepared with the graphics program PyMol (http://www.pymol.org/). 

Model of TXS in the fully closed conformation. To model the N terminus and 
J-K loop segments of TXS in a fully closed conformation and to calculate the 
active-site cavity volume, the N terminus (residues 54-81) and J-K loop (residues 
574-587) segments of bornyl diphosphate synthase in its complex with three 
Mg’* ions and 3-azageranyl diphosphate (PDB accession code 1N20) were 
‘grafted’ onto the structure of the TXS-Mg”*;-FGP complex and mutated to the 
corresponding residues of TXS; $110 was also introduced to account for an inser- 
tion in the sequence alignment between TXS and bornyl diphosphate synthase. 
The conformations of the grafted segments were then subjected to 10,000 steps of 
gas-phase conjugate gradient energy minimization, using NAMD” and the 
CHARMM22 force field**. During energy minimization the grafted segments plus 
three adjacent residues on the N-terminal and C-terminal ends were uncon- 
strained, while the remaining heavy atoms were fixed. Non-bonded cutoff and 
switch distances were set to 12 A and 10 A, respectively. The final structure result- 
ing from this computation was used as the hypothetical fully closed conformation 
of TXS. The meshwork representing the active-site cavity of TXS was calculated 
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with VOIDOO®, using a probe with a radius of 1.4A to generate a molecular 
surface based on solvent accessibility. To study product-binding orientations in 
the enclosed active site, a model of taxadiene was constructed on the basis of the 
coordinates of the taxane core of Taxol deposited in the Cambridge 
Crystallographic Data Centre with accession code TEYPAO”. 

The active-site cavity volume of TXS was compared with the active-site cavity 
volumes of other terpenoid cyclases, their substrates and their products (Fig. 4a 
and Supplementary Table 2). All volume calculations were performed with 
VOIDOO, using a probe with a radius of 0.0 A to generate a molecular surface 
based on the atomic van der Waals radii. Because the active site of the hemiterpene 
synthase isoprene synthase was not fully closed as a result of disorder of the J-K 
loop, the active-site contour was artificially truncated by the placement of ‘dummy’ 
atoms to estimate the boundary of the fully enclosed cavity. We used a similar 
approach to model the cavity of TXS in a chemically sensible manner. 
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Genetic variegation of clonal architecture 
and propagating cells in leukaemia 


Kristina Anderson’, Christoph Lutz’, Frederik W. van Delft!, Caroline M. Bateman!, Yanping Guo’, Susan M. Colman', 
Helena Kempski*, Anthony V. Moorman’, Jan Titley!, John Swansbury!, Lyndal Kearney’, Tariq Enver”} & Mel Greaves! 


Little is known of the genetic architecture of cancer at the subclonal and single-cell level or in the cells responsible for 
cancer clone maintenance and propagation. Here we have examined this issue in childhood acute lymphoblastic leukaemia 
in which the ETV6-RUNX1 gene fusion is an early or initiating genetic lesion followed by a modest number of recurrent or 
‘driver’ copy number alterations. By multiplexing fluorescence in situ hybridization probes for these mutations, up to eight 
genetic abnormalities can be detected in single cells, a genetic signature of subclones identified and a composite picture of 
subclonal architecture and putative ancestral trees assembled. Subclones in acute lymphoblastic leukaemia have variegated 
genetics and complex, nonlinear or branching evolutionary histories. Copy number alterations are independently and 
reiteratively acquired in subclones of individual patients, and in no preferential order. Clonal architecture is dynamic and is 
subject to change in the lead-up to a diagnosis and in relapse. Leukaemia propagating cells, assayed by serial transplantation 
in NOD/SCID IL2Ry"™" mice, are also genetically variegated, mirroring subclonal patterns, and vary in competitive 
regenerative capacity in vivo. These data have implications for cancer genomics and for the targeted therapy of cancer. 


Recent genome-wide scrutiny of cancer cells has revealed extraordinary 
complexity, with substantial numbers of both potential ‘driver’ and 
neutral or ‘passenger’ mutations per case’. Informative though these 
screens are, they probably reflect predominant or composite genetic 
landscapes that obscure the existence of subclonal heterogeneity of 
disease’. Intraclonal genetic diversity is a common feature of cancer* 
and is probably, from a Darwinian, natural selection perspective, the 
essential substrate for clonal evolution, disease progression, relapse or 
metastasis. Subclonal genetic complexity might also be an important 
consideration for therapeutic targeting. Furthermore, if a subset of 
‘stem-like’ cancer cells, or, as we refer to, propagating cells, are the basis 
of sustained clonal expansion and disease progression’ then, in principle, 
they should be genetically diverse if selection and passage through evolu- 
tionary bottlenecks is to occur. 

Identifying intraclonal genetic architecture requires genetic 
scrutiny of single cells or clonal foci, and there are limited examples 
of this so far®; nevertheless, they testify to the existence of significant 
heterogeneity. The genetic diversity of cancer propagating cells is, as 
yet, unexplored. We elected to address this issue in lymphoblastic 
leukaemia. The substantial advantage of this cancer, in addition to 
its amenability to single-cell analysis, is that it is minimally deranged 
or unstable, genetically, and the broad, temporal sequence of genetic 
events is known. For the B-cell precursor subset of childhood acute 
lymphoblastic leukaemia (ALL) with ETV6-RUNX1 fusion studied 
here, the latter genetic lesion is predominantly a prenatal and pre- 
sumed initiating event’. It is coupled with a modest number (3-6) of 
recurrent, genomic copy number alterations (CNA)*. These accrue as 
secondary and, most likely, postnatal lesions? in genes that, predomi- 
nantly, regulate the cell cycle or B-cell differentiation’. 


Subclonal diversity of genotypes in ALL 


We initially selected 60 cases of ETV6-RUNX1-positive ALL and in 
which ETV6 was also deleted (15-85% of cells) as detected by fluorescence 


in situ hybridization (FISH). Of these, 30 were further selected (see 
Supplementary Table 1) that also had (by FISH) deletion of PAX5 
(n = 15) or CDKN2A (also called p16) (n = 12) or deletions of both 
PAX5 and CDKN2A (n = 3) in at least 10% of cells. All 30 cases were 
then scrutinized using a multiplexed combination of distinctive 
fluorochrome-labelled bacterial artificial chromosome (BAC) probes. 
Two-hundred cells with the ETV6-RUNX1 fusion signal (that is, the 
reference founder mutation present in all leukaemic cells) were 
evaluated for each case and each individual cell designed an allele status 
(that is, mono- or bi-allelic deletion) for ETV6 and PAX5 (three colour) 
or ETV6 and CDKN2A (three colour) or ETV6, PAX5 and CDKN2A 
(four colour). The use of an ETV6-RUNX1 probe also allowed us to 
detect duplication of the fusion gene (in 15 out of 30 cases) or an extra 
copy of chromosome 21q (via RUNX1 signal copy number; in 21 out of 
30 cases). The latter is a common genetic abnormality in ALL and an 
assumed driver event’. Cutoff levels (%) for scoring genetically dis- 
tinctive subclones were determined using normal blood controls and 
varied depending upon probe set combination. A threshold was set at 
2% for cells with a single CNA (in addition to ETV6-RUNX1 fusion) 
and 1% for cells with two or more CNA (see Methods and Sup- 
plementary Table 2). 

Enumeration of CNA in individual cells in reference to ETV6- 
RUNX1 fusion—the universal marker of all the leukaemic cells— 
allowed us to identify distinctive genetic signatures of subclones 
and their relative frequencies. From this we could infer the most likely 
evolutionary or ancestral relationships between the subclones and 
derive a clonal architecture. 

The genetic architectures that were observed were very diverse. The 
simplest of these genetic architectures that we identified (in 6 cases) 
involved two or three subclones that could be aligned in a linear 
sequence (Fig. la); however, these were cases with the lowest 
complement of CNA (only two or three out of the possible total of 
the seven pre-selected) in addition to the ETV6-RUNX1 fusion. All 
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Figure 1 | Examples of subclonal architecture in ALL. a, Apparent linear 
architecture, with three clones (patient no. 13). Representative FISH images on 
the right show examples of each subclone. b, Moderately complex architecture 
with five subclones (patient no. 8). Loss of the untranslocated ETV6 allele 
occurs independently in three separate subclones (boxes). c, Complex 
architecture with eight subclones (patient no. 16). PAXS5 deletions occur 
independently in two separate subclones (boxes). Arrows indicate probable (or 
most likely) ancestral derivation of subclones; dashed arrows indicate possible 
(or alternative) origins of subclones. F, yellow signal, ETV6-RUNX1 fusion 
gene; 2 RUNX1, two red signals (one large, one small) corresponding to one 
normal RUNX1 allele, and one small remnant generated from disruption of 
RUNX1 allele involved in the gene fusion; ETV6, green signal corresponding to 
the normal (untranslocated) ETV6 allele; PAX5 (or CDKN2A), pink signal. 


other 24 cases had a more marked subclonal heterogeneity with up to 
ten subclones related via a branching ancestral tree (Fig. 1b, c). 
Figure la-c illustrates examples of the clonal architectures observed 
(all other cases are depicted in Supplementary Fig. 2). 
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Inspection of clonal genotypes reveals some previously unrecognized 
features. It is apparent that the common or highly recurrent CNA are 
not acquired in any preferential order, indicating that their potency as 
oncogenic mutations may not be contingent upon (or epistatic to) 
other CNA. Subclones with the highest number of CNA, positioned 
‘terminally’ in the branching architecture, were not necessarily numer- 
ically dominant (for example, all three cases illustrated in Fig. 1). 
Unexpectedly, CNA involving the same gene could be simultaneously 
present in distinct subclones and must therefore arise more than once, 
independently. ETV6 was independently deleted two to three times in 
14 of the 30 cases (see Fig. 1b, c), PAX5 deleted two or three times in 8 
out of 18 cases (see Fig. 1c) and CDKN2A deleted two times in 4 of 15 
cases. This raises interesting mechanistic questions and suggests that 
these lesions are not only selected on the basis of clonal advantage but 
may be targeted for DNA-level breakage. One possible mechanism is 
via off-target effects of RAGS or AID". 


Immunophenotypes and genetic diversity 


There is a spectrum of early B-lineage differentiation-linked immu- 
nophenotypic signatures in ALL“ and evidence has been presented 
indicating that cells with several different antibody-defined pheno- 
types may have leukaemia propagating activity in vivo’. We analysed 
the genetic heterogeneity of cells flow sorted on the basis of their 
expression of CD34 (immature lineage marker) or CD20 (more 
mature B lineage marker). Sorted populations had similarly complex 
genetic architectures (Supplementary Fig. 3). 

Cells with the immunophenotype CD34* CD38 /°“CD19* appear, 
so far, to be unique to ALL”. We previously found this pro-B/stem 
population, possibly non-activated or quiescent (CD38), to be signifi- 
cantly enriched in ALL propagating cells when assayed in the NOD/ 
SCID strain of mice'’. When purified by cell sorting, these cells (from 
patient no. 7) had similar genetic complexity to the bulk leukaemic 
population (Supplementary Fig. 3). 


Clonal architecture in ALL is dynamic 


These descriptions of subclonal, genetic profiles in ALL are snapshots 
taken at a particular time point, that is, at diagnosis. It is likely that 
subclonal diversity and the relative dominance of subclones varies 
continuously with the development and progression of disease. ALL 
rarely has an identified prodromal phase but occasionally (~2%) 
patients with ALL have an aplastic, pre-leukaemic phase a few months 
before a diagnosis of leukaemia’*. We previously described one such 
patient with ETV6-RUNX1* ALL'’. The diagnostic ALL cells had 
ETV6-RUNX1 fusion but no ETV6 deletion. Single nucleotide poly- 
morphism (SNP) array screening revealed multiple deletions includ- 
ing BTG1 and 11q and gain of chromosome X. We compared the 
clonal genotypes of cells from this patient at these two time points, 
spread some 7 months apart, and observed a marked shift in clonal 
architecture (Fig. 2a). The subclones dominating the aplastic, pre- 
malignant phase were relegated to minor, intermediary subclones in 
the overt leukaemic phase, with dominance of progeny clones that 
had homozygously deleted CDKN2A. A second patient with a pro- 
dromal, aplastic phase some 3 months before a diagnosis of ALL also 
showed a shift in subclonal dominance (Supplementary Fig. 4). 
Treatment and subsequent relapse in ALL reorders the spectrum of 
genetic abnormalities detected by single gene probing” or SNP 
arrays” reflecting the probable selection of distinct subclones as a 
basis of relapse*’. For five of the thirty selected patients (numbers 6, 
9, 28, 29 and 30), we had matched diagnosis and relapse cells available 
for multiplexed FISH analysis. Clonal architecture at relapse was dif- 
ferent from that at diagnosis. The comparative genetic profiles of 
subclones allowed us to identify the most likely subclone giving rise 
to the relapse, although this attribution was not unambiguous (Fig. 2b 
and Supplementary Fig. 2). Relapse seems to derive from either major 
or minor clones at diagnosis as previously suggested*' but with a 
suggestion that more than one subclone might contribute to relapse 
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Figure 2 | Changes in clonal architecture in ALL. a, Scoring for ETV6- 
RUNX1 fusion and simultaneous deletion of CDKN2A and BTG1 in blast cells 
at diagnosis of ALL and in bone marrow taken 7 months earlier during a pre- 
leukaemic aplasia phase. During the pre-leukaemic aplasia phase no deletion of 
CDKN2A was present, although several other copy number abnormalities were 
present including 11q, 15q, 5q and BTG1 gene deletions as well as gain of Xq 
(ref. 19). At diagnosis of ALL some 7 months later, the predominant clone 
contained homozygous deletion of CDKN2A. Only clones above the cutoffs are 
shown. Representative, four-colour FISH pictures for the dominant clones 
during the aplasia and leukaemic phases are shown at the bottom. b, An 
example of relapse originating from a major clone at diagnosis (patient no. 9) 
(other matched relapse cases for patients 6, 28, 29, 30 are in Supplementary Fig. 
2). R, probable clonal origin of relapse. 


(for example, patients 6 and 30; Supplementary Fig. 2). The data also 
indicate that the dominant subclone in relapse itself continues to 
genetically diversify, in some cases acquiring genetic lesions in the 
same gene (or chromosome region) as observed in primary, diagnostic 
subclones. This, along with previous observations on distinctive ETV6 
deletions in relapse versus diagnosis”, provides further evidence for 
reiterative CNA. The patterns of genetic diversity observed in relapse 
indicate that genetically distinct leukaemic propagating cells can sur- 
vive chemotherapy and provide a reservoir for relapse and further 
diversification. 


Genetic diversity of propagating cells in ALL 

Within the genetic architecture of ALL, it cannot be assumed that all 
identified subclones are self-sustaining and propagated by cells with 
extensive self-renewing capacity’. As in evolutionary speciation, it is 
likely that some branches or subclones are long-lived whereas others are 
dead ends or out-competed. Nevertheless, the architectural patterns 
that we observed suggested the possibility that propagating cells for 
ALL might also have variegated genetics and that this should be demon- 
strable via serial transplantation in immunodeficient mice. We trans- 
planted, intra-tibially, varying numbers (2 x 10°-10°) of unfractionated 
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or immunophenotypically flow-sorted leukaemic cells into pre- 
irradiated NOD/SCID IL2Ry"™" mice. Expanded leukaemic popula- 
tions were re-transplanted into secondary recipient mice as a validation 
of self-renewal capacity. We compared the genetic signatures of the cell 
populations that emerged by successful regeneration in vivo, from first 
and secondary transplants, with those in the original diagnostic sample. 
Mice with regenerated ALL had significant proportions (3.1 to 93.5 av. 
59.4; Supplementary Tables 3 and 4) of human haematopoietic 
(CD45*) cells in the marrow and large, pale spleens (Supplementary 
Fig. 5b, c). Effectively, all (>99%) human CD45" cells were leukaemic 
with the ETV6-RUNX1 fusion (Supplementary Fig. 5d). Genetic ana- 
lysis was carried out on cells harvested from bone marrow but when 
assessed, spleen provided the same result (Supplementary Table 4). 

Leukaemic regeneration in vivo was observed consistently in both 
unfractionated populations and in fractions defined immunopheno- 
typically as CD34 *CD38-"°“CD19* and CD34* CD38*CD19* (Sup- 
plementary Tables 3 and 4). This accords with previous evidence’ that 
propagating cells in B-cell precursor ALL are not restricted to one 
immunophenotypic compartment. 

In all 24 mice with primary or secondary leukaemic regeneration, 
several genetically distinct subclones were present, the patterns of 
which reflected the diversity of subclones identified in the original 
diagnostic sample (Figs 3 and 4 and Supplementary Tables 3 and 4). 
The leukaemic cells regenerated in secondary transplants were com- 
pared to pre-transplant primary cells by high-resolution SNP arrays. 
For patient no. 3, these data confirmed subclonal loss of CDKN2A, 
subclonal gain of chromosome 21 and loss of one copy of ETV6 in 
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Figure 3 | Genetics of cells propagating NOD/SCID IL2Ry"™" mice. 
Leukaemic cells from patient no. 3 before injection (a), after primary 
transplantation (b; mouse 1, Supplementary Table 3) and after secondary 
transplantation (c; mouse 2, Supplementary Table 3). Representative FISH 
images are shown of the four subclones (B-E) and the putative pre-leukaemic 
cell (A) at the top in a. Boxed cell, below significance threshold. 
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Figure 4 | Shift in clonal architecture of ALL after in vivo NOD/SCID 
IL2Ry""" transplantation. Clonal architecture of unsorted cells from patient 
no. 7 before injection (a), after primary transplantation (b; mouse 1, 
Supplementary Table 4) and after secondary transplantation (c; mouse 2, 
Supplementary Table 4). d, Summary of SNP data on diagnostic versus 
secondary transplant ALL cells. The blue lines indicate the mean copy number 
plot of five contiguous SNPs. Middle line, normal diploid copy number. Two 
fractions of DNA have been examined using 500K SNP arrays: diagnostic DNA 
(unsorted cells from patient no. 7 before injection) and DNA from leukaemic 
cells regenerated in mice (mouse 2, Supplementary Table 4). Deletions of 
CDKN2A, BTLA and BTG1 are present in both diagnostic and regenerated 
samples, whereas deletion of ETV6, subclonal gain of the 13q31.3-q34 region 
and gain of chromosome 22 are distinctive between the two samples. Plus and 
minus symbols indicate gains or losses of genetic regions, respectively. Arrows 
highlight CNA. 


both diagnostic and regenerated samples (Supplementary Fig. 6 and 
Supplementary Table 3). 

The genetic profiles of regenerated leukaemias revealed variable 
potency of genetically distinct subclones. For patient no. 3 (Fig. 3 
and Supplementary Table 3), four subclones read-out in primary 
and secondary transplants with three being co-dominant. The putative 
pre-leukaemic clone (with ETV6-RUNX1 fusion only) did not regenerate. 
With unfractionated cells from patient no. 7 (Fig. 4 and Supplementary 
Table 4), four of six subclones regenerated upon secondary transplanta- 
tion. One of these (subclone F in Fig. 4) was dominant despite being a 
minor (11%) subclone in the initial diagnostic sample. SNP arrays were 
used to compare the primary diagnostic sample (patient no. 7) versus 
the regenerated (secondary) transplant leukaemias. These revealed that 
both leukaemic populations had BTLA and BTGI1 deletions (Fig. 4d) in 
addition to the CNA in CDKN2A, ETV6 and chromosome 21. The 
regenerated leukaemia had an additional chromosome 22 that appeared 
to be absent from the initial, diagnostic cell population (Fig. 4d). This 
was further investigated by FISH using a chromosome-22-specific BAC 
probe. This confirmed the SNP array data with the majority of cells 
(with three RUNX]1 and one ETV6 signals) in the regenerated leukaemia 
having an extra chromosome 22 signal (Supplementary Fig. 7). No cells 
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with an extra chromosome 22 signal were detectable in the initial 
diagnostic sample in accord with the SNP array data. However, because 
clone F was dominant in all 9 of 9 mice transplanted with the same cell 
population (Supplementary Table 4), we assume that a minor subclone 
of clone F with the extra chromosome 22 was present in the diagnostic 
sample but at a<1% frequency. Variable competitive potency of 
subclonal regeneration was also seen in mice injected with immuno- 
logically fractionated cells (Supplementary Table 4 and Supplementary 
Figs 7 and 8). 

These data are indicative of additional genetic complexity of sub- 
clones and their propagating cells. Moreover, they indicate that distinc- 
tive genotypes are associated, functionally, with variable competitive 
regeneration in vivo. 


Discussion 


Cancer developmentat the cellular level is widely regarded as a Darwinian 
evolutionary process involving ‘natural selection’ of genetically variant 
cells in the context of a complex micro-environmental ecology”. 
Mutational and phenotypic diversity between cells is, in principle, fun- 
damental to this process. Moreover, driver mutations can be expected to 
have maximal selective currency when present in cells with self-renewing 
functionality. 

Evidence for intraclonal genetic diversity in cancer has been pro- 
vided by chromosome karyotype”’, by genetic analysis of multi-focal 
(but monoclonal) cancers*®, by FISH-based screening of tissue sec- 
tions’? or immuno-selected cells*®, by the molecular probing of 
multiple small biopsies’ or of micro-dissected tissue*” ** and, recently, 
by sector-ploidy profiling’. Small numbers of individual circulating 
tumour cells have also been scrutinized for their divergent genetic 
profiles**’’. These studies collectively testify that contemporaneous 
intraclonal genetic heterogeneity is commonplace and, in some cases 
at least, the degree of clonal diversity is predictive of disease progres- 
sion*'. Most of these data derive from epithelial carcinomas with com- 
plex genetic profiles, coupled, in most cases, to genetic instability. In 
such cases the historical timing and sequence of critical or driver 
mutational events is effectively buried and clonal architecture could 
be extremely complex unless clonal dominance occurs. 

A common assumption for both leukaemias and cancer in general, 
based on the original evolutionary model of ref. 22, is that progression of 
disease and predominant genetic profiles reflect sequentially dominant 
clones and an essentially linear dynamic. Our data (summarized in 
Supplementary Fig. 1) suggest dynamic patterns of subclonal develop- 
ment and ancestral relationships that are nonlinear with a variable 
branching architecture. Patterns of genetic diversity in other 
cancers—assessed by single cell or ploidy sorted cell comparative geno- 
mic hybridization (CGH)****”, oncogenically neutral microsatellite 
markers**” or deep-sequenced IGH gene rearrangements*'—also indi- 
cate nonlinear, branching clonal trajectories. Collectively, these data 
indicate that cancer has a cellular and genetic architecture reminiscent 
of Darwin’s iconic evolutionary tree (or bush) diagram depicting 
speciation”. 

The extent of genetic variegation in subclones that we detect must 
be a significant underestimate. We screened for a limited number of 
pre-selected CNA, which means that other CNA plus any sequence- 
based driver mutations present will not have been registered. Moreover, 
antecedent or intermediary subclones, below the 1-2% frequency which 
we set as a threshold, were identified with more extensive screening in 
several cases (see Supplementary Fig. 2 patients 2, 5, 15, 18, 20, 6). 
Identifying the full complexity of subclonal architecture and genetic 
diversity in ALL (and other cancers) will ultimately require whole- 
genome analysis at the single cell level. 

Our data provide the first direct evidence for genetic diversity of 
cancer propagating cells within individual patients. The consistent 
patterns of subclonal regeneration in mice (that is, in different mice 
injected with the same cellular inoculum) suggest variable capacity 
intrinsically associated with the distinct genotypes of propagating 


©2010 Macmillan Publishers Limited. All rights reserved 


cells. The competitive potency of particular subclones observed, 
however, may to some extent reflect selective pressures exerted by 
regenerative stress in a murine tissue environment. Natural clonal 
selection in patients might produce different outcomes. 

It will be important to assess if genetic diversity of propagating cells 
holds true for other types of leukaemia and cancer in general. If it 
does, then there would be significant implications for both the cancer- 
stem-cell concept itself and for the therapeutic targeting of such cells. 
The original model of a distinct, hierarchically positioned subpopula- 
tion of cancer stem cells’ has proved contentious in both ALL’’ and 
other cancers***. It has been suggested that the NOD/SCID in vivo 
readout for human cancer stem cell may, at least for some cancers, 
simply register dominant subclones****. Or, alternatively, that cancer 
stem cells exist but evolve over time****. We have previously docu- 
mented that ‘pre-leukaemic’ and overt leukaemia propagating cells in 
ETV6-RUNX1-positive ALL, although clonally related by descent, are 
distinctive in IgH rearrangements and phenotype’’. Our current data 
fit best with what we refer to as a ‘back to Darwin’ model for cancer 
propagating cells and resultant clonal architecture**. In this, cells with 
self-renewing properties have variegated genotypes providing the 
units of selection in the evolutionary diversification and progression 
of disease. Both sequential and concurrent genotypic variation in 
propagating cells occur in ALL and, we predict, are likely to do so 
in other cancers, providing a rich substrate for progression of disease. 
Although it has yet to be evaluated, it is likely that genetic diversity of 
cancer propagating cells will be associated with both frequency vari- 
ation and diversity of functional properties, for example, differenti- 
ation status, niche occupancy, quiescence and drug or irradiation 
sensitivity. This may help to explain some of the inconsistencies 
and controversies in the cancer-stem-cell field** *”**”. Genetic diversity 
in cancer varies in extent with stage of disease’**’, probably reflecting 
the impact of intraclonal competition and ecological bottlenecks. Single 
cells might negotiate very stringent bottlenecks but the genetic profiles 
that we observed in relapsed ALL and as recorded in, for example, 
prostate cancer metastases*””° indicate continued diversification of 
propagating cells and dominant or therapy-resistant subclones. 

This perspective contrasts with the unidimensional or flat (albeit 
very complex) genetic landscapes of cancer implied in portraits derived 
from whole-genome scans. This architectural distinction may be of 
some clinical consequence. Targeted therapy, if directed at mutant 
molecules, may have limited efficacy if the targets themselves are not 
initiating lesions but secondary mutations segregated in subclones, 
even when the latter appear dominant. Genetic variegation of cancer 
propagating cells may represent a significant roadblock to effective 
therapy. 


METHODS SUMMARY 


Archival methanol:acetic-acid-fixed cytogenetic pellets from patients with ETV6- 
RUNX1 fusion-gene-positive ALL were obtained from several UK hospitals, with 
local ethical review committee approval (CCR 2285, Royal Marsden Hospital 
NHS Foundation Trust). The clinical and cytogenetic data on these patients are 
given in Supplementary Table 1. Interphase FISH was performed as previously 
described”. 

In each case, at least 200 nuclei were scored for the presence of the ETV6- 
RUNX1 fusion gene in combination with hemizygous or homozygous deletion of 
ETV6, RUNX1, PAX5, CDKN2A and BTG1 and 11g, as well as gain of RUNX1 and 
duplication of the ETV6-RUNX1 fusion gene. Controls included the scoring of 
residual normal cells within the diagnostic sample and scoring leukaemic cells 
with probes hybridizing to irrelevant oncogenes (BCR, ABL) (see main Methods 
and Supplementary Figs 9 and 10). 

NOD/SCID IL2Ry"™" mice that lack any B, T and natural killer cell activity were 
bred and maintained under sterile conditions in accordance with Home Office 
regulations. Transplantation of cells was by intra-tibial injections in 7-14-week- 
old mice after 250cGy irradiation. Peripheral engraftment was assessed at 
9-10 weeks after transplantation and if >2% mice were killed. Further analysis 
included the assessment of bone marrow/spleen engraftment, FISH analysis, his- 
tological analysis and serial transplantation. For serial transplantations, recovered 
bone marrow cells were stained with human CD45 to detect human engraftment. 
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An equivalent of 2 X 10° to 2X 10° human cells was transplanted by intra-tibial 
injections. 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Interphase fluorescence in situ hybridization (FISH). Archival methanol:acetic- 
acid-fixed cytogenetic pellets from patients with ETV6-RUNX1 fusion-gene- 
positive ALL were obtained from several UK hospitals, with local ethical review 
committee approval (CCR 2285, Royal Marsden Hospital NHS Foundation Trust). 
Interphase FISH for the ETV6-RUNX1 fusion gene was performed using a com- 
mercial LSI TEL-AML1 extra signal (ES) probe (Vysis, Abbott Laboratories Ltd) 
according to the manufacturers’ instructions. This probe set contains a 350-kb 
probe for the 5’ end of ETV6 (exons 1-4) and a 500-kb probe covering the entire 
RUNX1 gene. The FISH signal pattern for the ETV6-RUNX1 fusion-gene-positive 
cells using the Vysis probe is two red (one large, one small RUNX1 signals), one 
green (ET V6allele not involved in the translocation), one red/green (yellow) fusion 
signal corresponding to the ETV6-RUNX1 fusion gene. Bacterial artificial chro- 
mosome (BAC) or fosmid probes for the PAX5, CDKN2A, BTG1, TBL1XR1 genes, 
11q and other regions of interest were obtained from the BACPAC Resource 
Centre, Children’s Hospital, Oakland Research Institute (http://bacpac.chori.org). 
These were labelled by nick translation with biotin-16-dUTP or digoxigenin-11- 
dUTP (Roche) and hybridized in combination with the ETV6-RUNX1 ES probe. 
FISH was performed by standard protocols”’” and labelled probes detected with 
streptavidin-Cy5 (bioytinylated probes) and (1) monoclonal anti-digoxigenin 
(Sigma), (2) horse anti-mouse IgG-Texas red (Vector Laboratories) and (3) goat 
anti-horse IgG-Texas Red (Jackson Immunochemicals) (for digoxigenin-labelled 
probes). Fluorescent signals were viewed using an Olympus AX2 fluorescence 
microscope equipped with narrow bandpass filters for DAPI, FITC, Spectrum 
orange, Texas red and Cy5. Images were captured and analysed using a charge- 
coupled device (Photometrics) and SmartCapture 3 software version 3.0.4 (Digital 
Scientific). 

Establishing cutoff levels. In each case, at least 200 nuclei were scored for the 
presence of the ETV6-RUNX1 fusion gene in combination with hemizygous or 
homozygous deletion of ETV6, RUNX1, PAX5, CDKN2A and BTG1 and 11q, as 
well as gain of RUNX1 and the ETV6-RUNX1 fusion gene. Diagnostic slides from 
26 cases were assessed for hybridization efficiency by scoring the residual normal 
(ETV6-RUNX1 fusion negative) cells on the same slide (see Supplementary Fig. 9). 
The percentage of these cells with loss of a single CDKN2A or PAXS signal was 
0-3% (mean = 1%). As a further control, we hybridized a subset (n = 11) of 
ETV6-RUNX1 fusion-gene-positive cases with uninvolved oncogene probes 
BCR and ABL (see Supplementary Fig. 10). The percentage of cells with the 
expected normal signal pattern (two red, two green) was 96-99% (mean = 98%). 
Cutoff levels for each probe were established by three-colour FISH (test probe in 
combination with the ETV6-RUNX1 ES probe) using three normal control 
peripheral blood slides per probe. As ETV6 and RUNX1 were scored on each slide, 
the values for these two probes were based on 12 slides in total. We used a 
cutoff = mean + 2X standard deviation. Cutoff levels for each probe (in com- 
bination with the ETV6-RUNX1 fusion) were established using 3-12 normal 
control peripheral blood slides. The cutoff levels for three-colour FISH experi- 
ments are given in Supplementary Table 2. Cutoff levels for four-colour FISH were 
the same as above, except for Texas red probes. The cutoff for loss of one signal 
using a fourth probe detected with Texas red was 6.9%, because of spectral overlap 
between Texas red (used to detect digoxigenin-labelled probes) and Spectrum 
orange (used to label RUNX1 in the commercial ETV6-RUNX1 ES probe). 
However, in most cases we used three-colour FISH as a cross-check to infer 
whether clones below this cutoff were real (Supplementary Fig. 2). Only fusion- 
gene-positive cells that also showed the small extra red signal (generated by dis- 
ruption of RUNX1) were used to calculate the relative frequencies of the various 
subclones. 

Genome mapping analysis. Mapping analysis was performed using 500 ng of 
tumour DNA. DNA was prepared according to manufacturer’s instructions using 
the GeneChip mapping 500K assay protocol for hybridization to GeneChip 
Mapping 250K Nsp and Sty arrays (Affymetrix). Briefly, genomic DNA was 
digested in parallel with restriction endonucleases NspI and Styl, ligated to an 
adaptor, and subjected to polymerase chain reaction (PCR) amplification with 
adaptor-specific primers. The PCR products were digested with DNasel and 
labelled with a biotinylated nucleotide analogue. The labelled DNA fragments 
were hybridized to the microarray, stained by streptavidin-phycoerythrin 
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conjugates, and washed using the Affymetrix Fluidics Station 450 then scanned 
with a GeneChip scanner 3000 7G. 

Copy number and LOH analysis. SNP genotypes were obtained using Affymetrix 
GCOS software (version 1.4) to obtain raw feature intensity and Affymetrix 
GTYPE software (version 4.0) using the BRLMM algorithm to derive SNP geno- 
types. The samples were analysed using CNAG 3.0 (http://plaza.umin.ac.jp/ 
genome), comparing tumour sample with unpaired control DNA to determine 
copy number and loss of heterozygosity (LOH) caused by imbalance*’. The posi- 
tion of regions of LOH were identified using the University of California Santa 
Cruz (UCSC) Genome Browser, May 2004 Assembly (http://genome.ucsc.edu/ 
cgi-bin/hgGateway). 

Combined fluorescence immunophenotype and FISH. Bone marrow and 
spleen cells from leukaemic mice were cytospun and used for combined or tri- 
ple-colour immunophenotype/FISH analysis as previously described*. Briefly, 
cells were air dried and fixed in acetone before incubating in primary biotinylated 
mouse anti-human CD45 (Clone F10-89-4) and detecting with Avidin-AMCA 
(Vector Laboratories). After antibody staining, the slides were hybridized with the 
Vysis ETV6-RUNX1 ES fusion gene FISH probe as described above. Cells were 
viewed using a Zeiss Axioskop fluorescence microscope fitted with a dual bandpass 
FITC and rhodamine filter, as well as individual DAPI (for AMCA immunophe- 
notype), FITC, rhodamine and Cy5. Images were captured using a charge-coupled 
device (Photometrics) and fluorescence signals merged and analysed using 
SmartCapture X software version 2.6.2 (Digital Scientific). 

Cell separation, phenotyping and sorting. Total mononuclear cells were iso- 
lated by Ficoll gradient centrifugation and directly cryopreserved in DMSO for 
later use. After thawing dead cells were evaluated and excluded by FACS after 
staining with Hoechst 33258 (Invitrogen). For sorting CD34 and CD20 positive 
and negative subsets, samples were stained with either mouse anti-human CD34 
(IgG,, Dako) or CD20 (IgG,, Southern Biotech) followed by anti-mouse IgG 
labelled with Pacific blue (Invitrogen). Nonspecific binding of antibody was 
assessed using mouse IgG, isotype control stained samples (Dako). Sorting was 
performed on a BD FACSAria with analysis by Becton Dickinson FACSDiVa 
software. Samples were gated on forward- and side-scatter plots for mononuclear 
cells and further gated in forward-scatter height versus area to exclude clumped 
cells. Before xeno-transplantation, some cells were stained with anti-CD19 PE 
(BD Pharmingen), CD34 FITC (BD Pharmingen) and CD38 APC (BD- 
Pharmingen). CD34*38/°"CD19* and pro-B CD34* CD38" CD19" cells were 
purified by flow cytometry (in this case, using MoFlo, Dako). Data acquisition 
and analysis were done with Summit (Dako) software. For multi-colour cell 
sorting ‘fluorescence minus one’ controls were used to determine positive and 
negative staining boundaries’. Human cells regenerating in mice were identified 
by staining with anti-CD45 PeCy7 (BD Pharmingen). 

NOD/SCID mouse transplantation. NOD/SCID IL2Ry"™ mice that lack any B, 
T and natural killer cell activity were bred and maintained at the Weatherall 
Institute of Molecular Medicine animal facility in accordance with Home Office 
regulations. Animals were handled under sterile conditions. Transplantations of 
2 X 10°-10° cells were performed by intra-tibial injections in 7-14-week-old mice. 
Recipients received 250cGy of total body irradiation before cell injection. 
Peripheral engraftment was assessed at 9-10 weeks after transplantation and if 
peripheral engraftment was >2% mice were killed. Further analysis included the 
assessment of bone marrow/spleen engraftment, FISH analysis, histological ana- 
lysis and serial transplantation. For serial transplantations recovered bone marrow 
cells were stained with human CD45 to detect human engraftment. An equivalent 
of 2 X 10° to 2 X 10° human cells was transplanted by intra-tibial injections. 
May-Griinwald Giemsa staining. The histological analysis of patient samples 
and mouse bone marrow was performed by May-Griinwald Giemsa staining of 
bone marrow smears, bone marrow cytospin preparations and spleen swabs. 
Slides were analysed on an Olympus BX60 microscope. 
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Atom-by-atom spectroscopy at graphene edge 


Kazu Suenaga! & Masanori Koshino! 


The properties of many nanoscale devices are sensitive to local 
atomic configurations, and so elemental identification and elec- 
tronic state analysis at the scale of individual atoms is becoming 
increasingly important. For example, graphene is regarded as a 
promising candidate for future devices, and the electronic properties 
of nanodevices constructed from this material are in large part 
governed by the edge structures’. The atomic configurations at gra- 
phene boundaries have been investigated by transmission electron 
microscopy and scanning tunnelling microscopy~*, but the elec- 
tronic properties of these edge states have not yet been determined 
with atomic resolution. Whereas simple elemental analysis at the 
level of single atoms can now be achieved by means of annular dark 
field imaging” or electron energy-loss spectroscopy®’, obtaining 
fine-structure spectroscopic information about individual light 
atoms such as those of carbon has been hampered by a combination 
of extremely weak signals and specimen damage by the electron 
beam. Here we overcome these difficulties to demonstrate site- 
specific single-atom spectroscopy at a graphene boundary, enabling 
direct investigation of the electronic and bonding structures of the 
edge atoms—in particular, discrimination of single-, double- and 
triple-coordinated carbon atoms is achieved with atomic resolution. 
By demonstrating how rich chemical information can be obtained 
from single atoms through energy-loss near-edge fine-structure ana- 
lysis’, our results should open the way to exploring the local elec- 
tronic structures of various nanodevices and individual molecules. 
A low-voltage scanning transmission electron microscope (STEM) 
was used for the single-atom spectroscopy’. Flakes were cleaved from 
the synthetic highly oriented pyrolytic graphite (HOPG) and put onto 
the microgrids for energy-loss near-edge fine structure (ELNES) 
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analysis. STEM annular dark field (ADF) images indicate that the gra- 
phene flakes have open and active edges’ and that the edges are steadily 
etched by the incident electron beam when the probe-scanning is 
repeated at the same region (Supplementary Fig. 1). The accelerating 
voltage used here (60kV) is below the critical energy predicted for 
severe knock-on damage’’ and therefore the carbon atoms in bulk are 
mostly stable. Only the edge atoms are mobile during the observation, 
as indicated by the wiggling contrast frequently observed at the edge 
regions. The fast Fourier transformation of an ADF image of few-layer 
graphene shows that the spatial resolution of the experimental set-up is 
better than 0.106 nm (inset to Supplementary Fig. 1a) and so the hexa- 
gonal network of carbon atoms, separated by about 0.14 nm, is clearly 
visible in a monolayer region (Supplementary Fig. 1b). A probe of the 
same size and brightness was used for the following ELNES analysis. 

Figure la shows a typical ADF image of the edge region of a single 
graphene layer. The hexagonal network of carbon atoms in bulk is visible 
on the right-hand side of the image and the vacuum region appears in 
black on the left-hand side. The possible carbon atom positions derived 
from the local intensity maxima of ADF signals are marked by yellow 
circles after an image-smoothing process in Fig. 1b. There is strong wiggle 
contrast at the edge regions and some of the atom positions cannot be 
completely identified. We note that some of the hexagonal networks are 
imperfect and considerably reconstructed at the edge region. 

The typical ELNES spectra of carbon K (1s)-edge are displayed with 
their corresponding atomic positions in Fig. 1c. Figure 1d shows three 
characteristic carbon K-edge fine structures extracted using sequential 
electron energy-loss spectroscopy (EELS) with probe-scanning (known 
as the spectrum-image mode)''. The spectrum in green was recorded at 
an atomic position in bulk (indicated by a green circle and arrow in 


Figure 1 | Graphene edge spectroscopy. a, ADF 
image of single graphene layer at the edge region. 
No image-processing has been done. Atomic 
positions are marked by circles in a smoothed 
image (b). Scale bars, 0.5 nm. d, ELNES of carbon K 
(1s) spectra taken at the colour-coded atoms 
indicated in b. Green, blue and red spectra 
correspond to the normal sp” carbon atom, a 
double-coordinated atom and a single-coordinated 
atom, respectively. These different states of atomic 
coordination are marked by coloured arrows in 

a and b and illustrated in c. CCD, charge-coupled 
device. 
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Fig. 1b) as a reference. This spectrum exhibits the features of typical sp” 
coordinated carbon atoms, such as the sharp * peak around 286 eV and 
the exciton peak of o* at 292 eV. These features are in good agreement 
with the previously reported spectra recorded from a bulk graphite 
specimen”. The spectrum in blue was recorded from an edge atom 
located at the border of the hexagonal network with two-coordination, 
as illustrated in Fig. 1c. Remarkably, this spectrum has a extra peak 
around 282.6 + 0.2 eV (labelled D in Fig. 1d), with the * peak having 
reduced intensity. Also the exciton peak intensity is considerably reduced 
and broadened compared to the bulk spectrum (marked by open circles). 

The spectrum in red shows similar features, also with weaker 1* 
peak and broadened o* peak. Its extra peak occurs at a different energy 
position of 283.6 + 0.2 eV (labelled S in Fig. 1d). It is extremely difficult 
to assign the atomic position completely for this red spectrum because 
the spectrum disappears quickly and is not fully reproducible. The 
edge region of the specimen tends to be strongly damaged and the 
edge morphology frequently changes after recording the spectrum 
image. Therefore, we can reasonably infer that this energy state must 
be somehow damage-related. One of the possible models for this edge 
structure is the Klein edge'*"*. The edge atom indicated in red in Fig. 1b 
is indeed single-bonded to its neighbour. The structure should be very 
unstable under the incident electron beam and so it may also explain 
the wiggling contrast often observed at the graphene edge. 

These spectral features involving peaks D and S have not previously 
been reported, to our knowledge. No fingerprinting method, compar- 
ing against the reference spectra of the existing polymorphic carbon, is 
able to explain them. We therefore performed ELNES simulations to 
correlate the experimental features with different atomic configura- 
tions (Fig. 2). The 2* peak shift to the lower energy is well reproduced 
for the edge atoms in the Klein, zigzag and armchair edge configura- 
tions (Fig. 2a, b and c), in comparison with the bulk carbon atom 
(Fig. 2d). The diminished excitonic effect can be confirmed for the 
Nein edge (Fig. 2a). The peak shift around 2 eV is well reproduced for 
the zigzag edge (Fig. 2b). In the spectrum of the armchair edge a sharp 
peak between m* and o* is expected (Fig. 2c). 

To show an atom-by-atom spectroscopy, we also performed EELS in 
the spectrum-line mode across a graphene edge. The probe scanned 
across the protruded carbon atom—the Klein edge—from the vacuum 
to the bulk region along the dotted line in Fig. 3a. A series of 100 spectra 
were sequentially recorded by scanning the electron probe with a con- 
stant step of about 0.02 nm. The total acquisition time was as small as 
50s. The illustrated model in Fig. 3b shows that eight carbon atoms were 
investigated in the spectrum line. Figure 3c shows a profile of ADF 
signals (in red) that was simultaneously recorded with the ELNES spec- 
tra. It shows good agreement with the simulated profile (in blue) show- 
ing eight maxima sequentially corresponding to the eight carbon atoms. 
Although the experimental profile is rather scattered owing to specimen 
instability or a possible inclination of the specimen to the incident 
electron beam, which should produce a slight asymmetry in the profile 
of the carbon doublets, we can deduce the carbon atomic positions 
reasonably well from the line profile and extract the ELNES spectra 
corresponding to each atom. Figure 3d shows ELNES fine structures 
obtained in this way, with the corresponding atoms numbered in 
Fig. 3c (each spectrum presented consists of four spectra in total). 

The delocalization effect at the carbon K-edge (~290 eV) with an 
incident electron probe of 60 kV is estimated as 0.20-0.25 nm in classical 
theory* and as ~0.12 nm at 300 kV more recently'*. Therefore the EELS 
signals, if combined with the probe size (~0.1 nm), may not be com- 
pletely localized at the single atoms on which the probe is exactly 
positioned. However the series of ELNES spectra in Fig. 3d strongly 
suggest that site-specific spectroscopy is indeed possible with atomic 
resolution at the graphene edge. The spectrum from atom 1 clearly 
shows peak S at 283.6 eV (indicated by a dotted circle), which is related 
to the Klein edge, but spectrum 2 does not show peak S (it may be a 
minor feature). Spectrum 5 shows a small trace of peak D, which can 
reasonably be explained by a possible introduction of the bond-breakage 
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Figure 2 | ELNES simulations for three graphene edge structures. Carbon 
K-edge spectra simulated for the Klein edge (a), zigzag edge (b), armchair edge 
(c) and bulk (three-coordinated) atom (d). A core-hole was introduced by 
partially removing a 1s electron from the carbon atoms (indicated by pink 
shading) to estimate the relative peak shift of the spectra. The reduced exciton 
peak found experimentally is well reproduced. The simulated ELNES from the 
zigzag and armchair edges show at least a qualitative match with experiments, 
although the absolute value for the energy shift cannot be fully confirmed. 


during the probe-scanning across the atom. Spectrum 8 from an atom 
1.5 nm away from the edge shows normal sp” features with the sharp 7* 
and excitonic o* peaks, which is very close to the bulk spectrum”. 

We performed intensity mapping of peaks D and S to estimate the 
delocalization effects further. A number of experiments, involving one 
set of spectrum-image and seven sets of spectrum-line on the graphene 
edges, are summarized in Supplementary Figs 3, 4 and 5. Results 
confirm that single-atom spectroscopy at specific sites of the graphene 
edge is indeed feasible with the reduced delocalization effect. 

We found no trace of oxygen at the investigated edges. This may 
contradict a generally accepted concept in which the graphene edge 
can be terminated by -OH or -COOH groups and the edge carbon 
atoms cannot be bared’. In this experiment, in situ etching with con- 
tinuous removal of the carbon edge atoms in vacuum always takes 
place and therefore the edge structures are always kept fresh. 

From this study, we have picked up some practical information 
about the graphene edge engineering. The open edges involve both 
single- and double-coordinated carbon atoms but their specific edge 
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Figure 3 | Atom-by-atom spectroscopy across the Klein edge. a, ADF image 
of graphene edge (no image-processing). The dotted arrow indicates where the 
spectrum-line was made (A to B). Scale bar, 0.5 nm. b, An atomic model of the 
investigated edge. c, Line-profile of the ADF counts (in red) recorded 
simultaneously with the spectrum-line. For comparison with the simulated 
ADF counts (blue), the number of each atom is indicated (from 1 to 8). d, The 
carbon K-edge ELNES obtained from each atom across the Klein edge. The 
single-coordinated carbon atom (numbered 1) clearly shows peak S. 


states are completely localized at the atomic level. Even for triple- 
coordinated carbon atoms, slight electronic structure modification, 
as indicated by the restricted excitonic effect (or the reduced o* peak), 
may exist near the edge region but it vanishes after 1.5nm from the 
edge front. The properties of graphene nanoribbons with smaller 
widths might be governed by the edge effects’®. 

It is very surprising that the EELS signal delocalization has turned out 
not to be very important for atom-by-atom spectroscopy in the present 
experiment. The EELS signal delocalization should be substantially 
decreased when a lower accelerating voltage is used for the incident 
electron probe®. The delocalization effect with a 30-60kV incident 
probe is only a fraction of that for the normal STEM operation voltage 
at 200-300 kV. Lowering the accelerating voltage of the electron micro- 
scope is therefore very beneficial, reducing the delocalization effect in 
addition to contrast enhancement and damage reduction. 

ELNES analysis from single atoms is highly desirable because the rich 
information it supplies will become accessible from individual atoms at 
any local area. The ELNES fingerprinting method has been widely used 
to determine the electronic/bonding states of unknown materials by 
comparison with the reference spectra of known materials. For example, 
the chemical state of Ce** or Ce** in metallofullerene molecules has 
been clearly discriminated at the single-atom level simply by measuring 
the energy shift’”. Here we have demonstrated the possibilities of ELNES 
spectra analysis beyond the simple fingerprinting method. Non-bulk 
atoms provide peculiar electronic structures and therefore their ENLES 
should be completely new (or previously unknown) and cannot be 
compared with any existing reference. Further efforts should be made 
to obtain the electronic state information from new ELNES spectra by 
combining atomic resolution imaging with theoretical calculations. 


METHODS SUMMARY 

STEM-EELS experiments. A JEOL 2100F transmission electron microscope with 
the DELTA corrector was operated at 60 kV (ref. 9). The energy resolution was 
around 0.4eV. We used a probe of 0.1 nm diameter with 20 pA for experiments. 
For spectroscopy, we used GIF Quantum", designed for low-voltage operations. 
The convergence angle for incident probe was set to 30 mrad, while the inner angle 
for ADF imaging was around 45-50 mrad, which is equal to the EELS collection 
angle. ELNES analysis was performed at each pixel while the incident probe 
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digitally scanned’!. The spectrum-image mode, consisting of a two-dimensional 
set of ELNES spectra, takes longer for total acquisition and easily leads to the 
destruction of the specimen. Therefore we frequently used the spectrum-line 
mode, consisting of a one-dimensional set of ENLES spectra, in this study. 
Typical acquisition time is around 0.1 to 1.0s for each spectrum. A spectrum line 
consists of 100 spectra, while an image spectrum consists of typically 12 x 12 
spectra (see also Supplementary Fig. 3). 

Specimen preparation. Commercially available synthetic HOPG (NT-MDT 
Company) was used for experiments. Some of the flakes were cleaved using 
Scotch tapes and then transferred to transmission electron microscope microgrids 
following the method developed by Meyer and co-workers’. 

ELNES simulations. The first-principles calculation based on DFT theory was 
used to estimate energy levels and partial density of states on carbon atoms of 
graphene structures. In the discrete variance-X method, the energy levels and 
partial density of states of unoccupied carbon 2p orbitals are estimated from the 
self-consistent charge calculation. To estimate the threshold energy of the carbon 
K-edge, the core-hole effect was considered by employing the transition-state 
approximation method, which configures a half-electron removed from the carbon 
1s orbital and added to an unoccupied orbital’®”’. See also Supplementary Fig. 2. 
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Directed differentiation of human pluripotent stem 
cells into intestinal tissue in vitro 
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Studies in embryonic development have guided successful efforts 
to direct the differentiation of human embryonic and induced 
pluripotent stem cells (PSCs) into specific organ cell types in 
vitro’’. For example, human PSCs have been differentiated into 
monolayer cultures of liver hepatocytes and pancreatic endocrine 
cells** that have therapeutic efficacy in animal models of liver 
disease”* and diabetes’, respectively. However, the generation of 
complex three-dimensional organ tissues in vitro remains a major 
challenge for translational studies. Here we establish a robust and 
efficient process to direct the differentiation of human PSCs into 
intestinal tissue in vitro using a temporal series of growth factor 
manipulations to mimic embryonic intestinal development’. This 
involved activin-induced definitive endoderm formation'!, FGF/ 
Wnt-induced posterior endoderm pattering, hindgut specification 
and morphogenesis’*"“, and a pro-intestinal culture system'®’® to 
promote intestinal growth, morphogenesis and cytodifferentia- 
tion. The resulting three-dimensional intestinal ‘organoids’ con- 
sisted of a polarized, columnar epithelium that was patterned into 
villus-like structures and crypt-like proliferative zones that 
expressed intestinal stem cell markers’’. The epithelium contained 
functional enterocytes, as well as goblet, Paneth and enteroendo- 
crine cells. Using this culture system as a model to study human 
intestinal development, we identified that the combined activity of 
WNT3A and FGF4 is required for hindgut specification whereas 
FGF4 alone is sufficient to promote hindgut morphogenesis. Our 
data indicate that human intestinal stem cells form de novo during 
development. We also determined that NEUROG3, a pro-endocrine 
transcription factor that is mutated in enteric anendocrinosis”, is 
both necessary and sufficient for human enteroendocrine cell 
development in vitro. PSC-derived human intestinal tissue should 
allow for unprecedented studies of human intestinal development 
and disease. 

The epithelium of the intestine is derived from a simple sheet of cells 
called the definitive endoderm”. As a first step to generating intestinal 
tissue from PSCs (summarized in Supplementary Fig. 1), we used 
activin A, a nodal-related TGF-8 molecule, to promote differentiation 
into definitive endoderm as previously described", resulting in up to 
90% of the cells co-expressing the definitive endoderm markers SOX17 
and FOXA2 and fewer than 2% expressing the mesoderm marker 
brachyury (Supplementary Fig. 2a). Using microarray analysis we 
observed a robust activation of definitive endoderm markers, many 
of which were expressed in mouse definitive endoderm from embry- 
onic day (e)7.5 embryos (Supplementary Fig. 3 and Supplementary 
Table 1a, b). We investigated the intrinsic ability of definitive endo- 
derm to form foregut and hindgut lineages by culturing for 7 days 
under permissive conditions and observed that cultures treated with 
activin A for only 3 days were competent to develop into both foregut 
(albumin (ALB) * and PDX1*) and hindgut (CDX2) lineages (Fig. 1b, 
control). In contrast, treatment with activin A for 4—5 days resulted in 


definitive endoderm cultures that were intrinsically anterior in char- 
acter and less competent in forming posterior lineages (Supplementary 
Fig. 2b). 
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Figure 1 | FGF4 and WNT3A act synergistically in a temporal and dose- 
dependent manner to specify stable posterior endoderm fate. a—d, Activin A 
(100 ng ml’) was used to differentiate H9 human ES cells into definitive 
endoderm. Definitive endoderm was treated with the posteriorizing factors 
FGF4 (50 or 500 ng), WNT3A (50 or 500 ng), or both for 6, 48 or 96h. Cells 
were placed in permissive media for 7 days and expression of foregut markers 
(ALB, PDX1) and the hindgut marker (CDX2) were analysed by RT-qPCR 
(a) and immunofluorescence (b-d). The definitive endoderm of controls was 
grown for identical lengths of time in the absence of FGF4 or WNT3A. High 
levels of FGF4+ WNT3A for 96h resulted in stable CDX2 expression and lack 
of foregut marker expression. Scale bars, 50 um. Error bars are s.e.m. (n = 3). 
*P < 0.05, **P < 0.001, ***P < 0.0001. 
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Having identified the window of time when definitive endoderm 
fate was plastic (day 3 of activin A treatment), we used WNT3A and 
FGF4 to promote hindgut and intestinal specification. Studies in 
mouse, chick and frog embryos have demonstrated that Wnt and 
FGF signalling pathways are required for repressing anterior develop- 
ment and promoting posterior endoderm formation into the midgut 
and hindgut'*""*. Consistent with this, conditioned media containing 
WNTS3A was recently shown to promote Cdx2 expression in mouse 
embryonic stem (ES)-cell-derived embryoid bodies”’. In human defini- 
tive endoderm cultures, neither factor alone was sufficient to robustly 
promote a posterior fate (Supplementary Fig. 2c); but high concentra- 
tions of both FGF4 and WNT3A (FGF4+WNT3A) induced expres- 
sion of the hindgut marker CDX2 in the definitive endoderm after 48 h 
(Supplementary Fig. 4). However, 48h of FGF4+WNT3A treatment 
did not stably induce a CDX2* hindgut fate and expression of anterior 
markers PDX1 and albumin reappeared after cells were cultured in 
permissive media for 7 days (Fig. la, c). In contrast, 96h of exposure 
to FGF4+ WNTS3A resulted in stable CDX2 expression and absence of 
anterior markers (Fig. la, d). These findings indicate a previously 
unidentified requirement for the synergistic activities of both the 
FGF and Wnt pathways in specifying the CDX2* mid/hindgut lineage. 

Remarkably, FGF4+ WNT3A-treated cultures underwent morpho- 
genesis that was similar to embryonic hindgut formation. Between 2 
and 5 days of FGF4+ WNT3A treatment, flat cell sheets condensed into 
CDX2"* epithelial tubes, many of which budded off to form floating 
hindgut spheroids (Fig. 2a-c, Supplementary Fig. 5a-f and Sup- 
plementary Table 2a). Spheroids were similar to e8.5 mouse hindgut 
and consisted of uniformly CDX2* polarized epithelium surrounded 
by CDX2* mesenchyme (Fig. 2d-g). Spheroids were completely devoid 
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Figure 2 | Morphogenesis of posterior endoderm into three-dimensional, 
hindgut-like spheroids. a, Bright-field images of definitive endoderm cultured 
for 96h in media, FGF4, WNT3A or FGF4+WNT3A. FGF4+WNT3A 
cultures contained three-dimensional epithelial tubes and free-floating spheres 
(black arrows) b, CDX2 immunostaining (green) and nuclear stain (DRAQ5, 
blue) on cultures shown in a. Insets show CDX2 staining alone. c, Bright-field 
image of hindgut-like spheroids. a-c, Scale bars, 50 um. d-f, Analysis of CDX2, 
basal-lateral laminin and E-cadherin expression demonstrates an inner layer of 
polarized, cuboidal, CDX2™ epithelium surrounded by non-polarized 
mesenchymal CDX2°* cells. Scale bar in e is 20 um. g, CDX2 expression in an 
e8.5 mouse embryo (sagittal section). Inset is a magnified view showing that 
both hindgut endoderm (E; outlined with a red dashed line) and adjacent 
mesenchyme (M) are CDX2 positive (green). FG, foregut; HG, hindgut. Scale 
bar, 100 pum. 
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of albumin and PDX1-expressing foregut cells (Supplementary Fig. 5h, 
i). In vitro gut-tube morphogenesis was never observed in control or 
WNT3A-only treated cultures. FGF4-treated cultures had a twofold 
expansion of mesoderm and generated 4-10-fold fewer spheroids 
(Supplementary Fig. 2c and Supplementary Table 2a), which were 
weakly CDX2* and did not undergo further expansion (data not 
shown). Together our data support a mechanism for hindgut develop- 
ment where FGF4 promotes mesoderm expansion and morphogenesis, 
whereas FGF4 and WNT3A synergy is required for the specification of 
the hindgut lineage. 

Importantly, this method for directed differentiation is broadly 
applicable to other PSC lines, as we were able to generate hindgut 
spheroids from both H1 and H9 human ES cell lines and from four 
induced PSC (iPSC) lines that we have generated and characterized 
(Supplementary Figs 3, 5 and 6). The kinetics of differentiation and the 
formation of spheroids were comparable between these lines (Sup- 
plementary Table 2). Two other iPSC lines tested were poor at hindgut 
spheroid formation and line iPSC3.6 also had a divergent transcrip- 
tional profile during definitive endoderm formation (Supplementary 
Fig. 3 and Supplementary Table 2c). 

Whereas in vivo engraftment of PSC-derived cell types, such as pan- 
creatic endocrine cells, has been used to promote maturation”, efficient 
development and maturation of organ tissues in vitro has proven more 
difficult. We investigated whether hindgut spheroids could develop and 
mature into intestinal tissue in vitro using recently described three- 
dimensional culture conditions that support growth and renewal of 
the adult intestinal epithelium’*'®. When placed into this culture 
system, hindgut spheroids developed into intestinal organoids in a 
staged manner that was notably similar to fetal gut development 
(Fig. 3, Supplementary Fig. 5g and Supplementary Fig. 7). In the first 
14 days the simple cuboidal epithelium of the spheroid expanded and 
formed a highly convoluted pseudostratified epithelium surrounded by 
mesenchymal cells (Fig. 3a—-c), similar to an e12.5 fetal mouse gut 
(Fig. 3f). After 28 days, the epithelium matured into a columnar 
epithelium with villus-like involutions that protrude into the lumen 
of the organoid (Fig. 3d, e). Comparable transitions were observed 
during mouse fetal intestinal development (Fig. 3f, g and Supplemen- 
tary Fig. 7). The spheroids expanded up to 40 fold in mass as they 
formed organoids (data not shown) and were split and passaged over 
9 additional times and cultured for over 140 days with no signs of 
growth failure. The cellular gain during that time was up to 1,800 fold 
(data not shown), resulting in a total cellular expansion of 72,000 fold 
per hindgut spheroid. This directed differentiation was up to 50 fold 
more efficient than spontaneous embryoid body differentiation meth- 
ods”’ (Supplementary Fig. 8) and resulted in organoids that were almost 
entirely intestinal (Supplementary Fig. 2e-g) as compared to embryoid 
bodies that contained a mix of neural, vascular and epidermal tissues 
(Supplementary Fig. 8). 

Marker analysis showed that after 14 days in culture, virtually all of 
the epithelium expressed the intestinal transcription factors CDX2, 
KLF5 and SOX9 broadly and was highly proliferative (Fig. 3b, c). By 
28 days, CDX2 and KLF5 remained broadly expressed in over 90% of 
the epithelium (Supplementary Fig. 2), whereas SOX9 became localized 
to pockets of proliferating cells at the base of the villus-like protrusions 

Fig. 3d, e) similar to the intervillus epithelium of fetal mouse intestines 
at e16.5 (Fig. 3g and Supplementary Fig. 9). 5-bromodeoxyuridine 
(BrdU) pulse chase and analysis of organoids using a Z-stack series 
of confocal microscopic images showed that epithelial BrdU incorp- 
oration was largely restricted to SOX9-expressing cells in crypt-like 
structures that penetrated into the underlying mesenchyme (Sup- 
plementary Fig. 9). At 28 days, LGR5 is not expressed and ASCL2 
(ref. 21) is broadly expressed and not restricted to the SOX9* prolif- 
erative zone. However, organoids cultured until 56 days expressed both 
ASCL2 and LGRS in restricted epithelial domains that appear to over- 
lap with the SOX9* zone (Fig. 3h-j and Supplementary Fig. 10). This 
domain is similar to developing intestinal progenitor domains in vivo, 
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Figure 3 | Human ES cells and iPSCs form three-dimensional intestine-like 
organoids. a, A time course shows that intestinal organoids formed highly 
convoluted epithelial structures surrounded by mesenchyme after 13 days (d). 
b-e, Intestinal transcription factor expression (KLF5, CDX2, SOX9) and cell 
proliferation on serial sections of organoids after 14 and 28 days (serial sections 
are b and c, d and e). Ki67, nuclear proliferation antigen. Nuc, nuclei. 

f, g, Expression of KLF5, CDX2, and SOX9 in mouse fetal intestine at e14.5 
(f) and e16.5 (g) is similar to developing intestinal organoids. The right panels 
show separate colour channels for d, e and g (bracket highlights the region 
shown in the panels on the right). h-j, Whole mount in situ hybridization of 56- 
day-old organoids showing epithelial expression of SOX9 (h) and restricted 
‘crypt-like’ expression of the stem cell markers LGRS5 (i) and ASCL2 (j). Insets 
show sense controls for each probe. Scale bars, 20 um. 


which ultimately give rise to the stem cell niche in the crypt of 
Lieberkiihn’>. iPSCs were equally capable of forming intestinal progeni- 
tor domains (Supplementary Fig. 9e). Thus, PSC-derived intestinal 
epithelium continued to mature in vitro and develop proliferative 
domains with nascent intestinal stem cells. 

Between 18 and 28 days in culture, we observed cytodifferentiation 
of the stratified epithelium into a columnar epithelium containing 
brush borders and all of the major cell lineages of the gut as determined 
by immunofluorescence and quantitative polymerase chain reaction 
with reverse transcription (RT-qPCR) (Fig. 4a-d and Supplementary 
Fig. 11). By 28 days of culture, villin (Fig. 4a) and DPPIV (not shown) 
were localized to the apical surface of the polarized columnar epithe- 
lium and transmission electron microscopy revealed a brush border of 
apical microvilli indistinguishable from those found in mature intest- 
ine (Fig. 4d and Supplementary Fig. 1). Enterocytes had a functional 
peptide transport system and were able to absorb a fluorescently 
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Figure 4 | Formation and function of intestinal cell types and regulation of 
enteroendocrine differentiation by NEUROG3. a-c, Twenty-eight-day 
iPSC-derived organoids were analysed for villin (VIL) and the goblet cell 
marker mucin (MUC2) (a), the Paneth cell marker lysozyme (LYZ) (b), or the 
endocrine cell marker chromogranin A (CHGA) (c). Nuc, nuclei. d, Electron 
micrograph showing an enterocyte cell with a characteristic brush border with 
microvilli (inset). e, Epithelial uptake of the fluorescently labelled dipeptide 
d-Ala-Lys-AMCA (arrowheads) indicating a functional peptide transport 
system. f-h, Adenoviral expression of NEUROG3 (Ad-NEUROG3) causes a 
fivefold increase in CGA* cells compared to a GFP control (Ad-GFP). n = 4 
biological samples;*P = 0.005. i-k, Organoids were generated from human ES 
cells that were stably transduced with shRNA-expressing lentiviral vectors. 
Compared to control shRNA organoids, NEUROG3 shRNA organoids had a 
95% reduction in the number of CHGA* cells. 1 = 3 for shRNA controls and 
n= 5 for NEUROG3-shRNA; *P = 0.018. Scale bar in a is 10 tum; all others are 
20 pm. Error bars are s.e.m. 


labelled dipeptide (Fig. 4e)””. Cell counting revealed that the epithe- 
lium contained approximately 15% MUC2" goblet cells, which secrete 
mucins into the lumen of the organoid, 18% lysozyme-positive cells, 
which are indicative of Paneth cells, and ~1% chromogranin-A- 
expressing enteroendocrine cells (Fig. 4 and Supplementary Fig. 11g). 
MUC2 and lysozyme staining indicated that the goblet and Paneth cells 
in 28-day organoids are immature (Fig. 4a, b). However, in organoids 
that were passaged over 100 days, all cells had acquired a more mature 
phenotype and Paneth cells were often localized in crypt-like structures 
(Supplementary Fig. 12b, c). RT-qPCR confirmed the presence of 
additional markers of differentiated enterocytes (FABP2; also known 
as IFABP) and Paneth cells (MMP7) (Supplementary Fig. 11). Indivi- 
dual organoids seemed to be a mix of proximal intestine (GATA4*/ 
GATA6") and distal intestine (GATA4 /GATA6*; HOXA13- 
expressing) (Supplementary Figs 11 and 13)”. Thus, directed differ- 
entiation of PSCs into intestinal tissue in vitro is highly efficient in 
generating three-dimensional intestinal tissue containing crypt-like 
progenitor niches, villus-like domains and all of the differentiated cell 
types of the intestinal epithelium. 

Intestinal organoids contained a mesenchymal layer that developed 
along with the epithelium in a staged manner similar to embryonic 
development’’* (Supplementary Fig. 14). Mesenchyme probably 
came from the 2% of mesoderm cells that were present after activin 
differentiation, which expanded up to 10% in FGF4-treated hindgut 
cultures (Supplementary Fig. 2). At 14 days, organoids broadly 
expressed mesenchymal markers including FOXF1 and vimentin 
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(Supplementary Fig. 14), similar to an e12.5 embryonic intestine (Sup- 
plementary Fig. 7). We also observed vimentin/smooth muscle actin 
(SMA; also known as ACTA2) double-positive cells indicative of 
intestinal subepithelial myofibroblasts”. By 28 days, we observed a layer 
of SMA‘ /desmin* double-positive cells, indicating smooth muscle, 
and desmin* /vimentin”* fibroblasts”®. The fact that intestinal mesench- 
yme differentiation coincided with differentiation of the overlying 
epithelium indicates that epithelial-mesenchymal crosstalk may be 
important in the development of PSC-derived intestinal organoids. 

The molecular basis of congenital malformations in humans is often 
inferred from functional studies in model organisms. For example, 
neurogenin 3 (NEUROG3) was investigated as a candidate gene 
responsible for congenital loss of intestinal enteroendocrine cells in 
humans’* because of its known role in enteroendocrine cell develop- 
ment in mouse” *°. However, it has been impossible to directly investi- 
gate the role of NEUROG3 during human intestinal development. We 
therefore performed gain- and loss-of-function analyses to investigate 
the role of NEUROG3 during human enteroendocrine cell development 
(Fig. 4 and Supplementary Fig. 15). NEUROG3 was overexpressed in 
28-day human organoids using adenoviral (Ad)-mediated trans- 
duction*’, After 7 days, approximately 5% of cells were GFP* and 
Ad-NEUROG3-GFP-infected organoids contained fivefold more chro- 
mogranin A* endocrine cells than control organoids (Ad-enhanced 
GFP (eGFP)) (Fig. 4f-h and Supplementary Fig. 15), demonstrating 
that NEUROG3 expression is sufficient to promote an enteroendocrine 
cell fate. To knockdown endogenous NEUROG3, we generated human 
ES cell lines by transducing cells with NEUROG3 short hairpin 
(sh)RNA-expressing lentiviral vectors. NEUROG3 mRNA levels were 
knocked down by 63% and this resulted in a 90% reduction in the 
number of enteroendocrine cells (Fig. 4i-k and Supplementary Fig. 
15d-f), demonstrating that intestinal enteroendocrine cell development 
is highly dependent on NEUROG3 expression. This indicates that 
partial loss-of-function mutations in human NEUROG3 would be suf- 
ficient to cause a marked reduction in enteroendocrine cell numbers. 

This is the first report, to our knowledge, demonstrating that human 
PSCs can be efficiently directed to differentiate in vitro into human 
tissue with a three-dimensional architecture and cellular composition 
remarkably similar to the fetal intestine. Moreover, PSC-derived 
human intestinal tissue undergoes maturation in vitro, developing 
intestinal stem cells and acquiring both absorptive and secretory func- 
tionality. This system allows for functional studies to investigate the 
molecular basis of human congenital gut defects in vitro and to generate 
intestinal tissue for eventual transplantation-based therapy for diseases 
such as necrotizing enterocolitis, inflammatory bowel diseases and 
short-gut syndromes. The ability to generate human intestinal tissues 
should also greatly facilitate future studies of intestinal stem cells and 
drug design to enhance absorption and bioavailability. 


METHODS SUMMARY 

Generation of human intestinal organoids. Human ES cells and iPSCs were 
maintained on Matrigel (BD Biosciences) in mTesR1 medium without feeders. 
Differentiation into definitive endoderm was carried out as previously described". 
Briefly, a 3-day activin A (R&D systems) differentiation protocol was used. Cells 
were treated with activin A (100 ng ml ~ ) for three consecutive days in RPMI 1640 
medium (Invitrogen) with increasing concentrations of 0%, 0.2% and 2% HyClone 
defined fetal bovine serum (dFBS; Thermo Scientific). For hindgut differentiation, 
definitive endoderm cells were incubated in 2% dFBS-DMEM/F12 with 500 ng 
ml! FGF4 and 500ng ml" ' WNT3A (R&D Systems) for up to 4 days. Between 2 
and 4 days of treatment with growth factors, three-dimensional floating spheroids 
formed and were then transferred into three-dimensional cultures previously 
shown to promote intestinal growth and differentiation’*”’. Briefly, spheroids were 
embedded in Matrigel (BD Bioscience) containing 500 ng ml_' R-Spondin1 (R&D 
Systems), 100ng ml | Noggin (R&D Systems) and 50ng ml ' EGE (R&D 
Systems). After the Matrigel solidified, medium (advanced DMEM/F12; 
Invitrogen) supplemented with L-glutamine, 10}1M HEPES, N2 supplement 
(R&D Systems), B27 supplement (Invitrogen), and _penicillin/streptomycin- 
containing growth factors was overlaid and replaced every 4 days. 
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Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 


Maintenance of PSCs. Human ES cells and induced pluripotent stem cells were 
maintained on Matrigel (BD Biosciences) in mTesR1 medium***. Cells were 
passaged approximately every 4 days, depending on colony density. To passage 
PSCs, they were washed with DMEM/F12 medium (no serum) (Invitrogen) and 
incubated in DMEM/F12 with 1 mg ml‘ dispase (Invitrogen) until colony edges 
started to detach from the dish. The dish was then washed 3 times with DMEM/ 
F12 medium. After the final wash, DMEM/F12 was replaced with mTesR1. 
Colonies were scraped off of the dish with a cell scraper and gently triturated into 
small clumps and passaged onto fresh Matrigel-coated plates. 

Differentiation of PSCs into definitive endoderm. Differentiation into defini- 
tive endoderm was carried out as previously described"’. Briefly, a 3-day activin A 
(R&D systems) differentiation protocol was used. Cells were treated with activin A 
(100 ng ml ') for three consecutive days in RPMI 1640 media (Invitrogen) with 
increasing concentrations of 0%, 0.2% and 2% HyClone defined fetal bovine serum 
(dFBS; Thermo Scientific). 

Differentiation of definitive endoderm in permissive media. After differenti- 
ation into definitive endoderm, cells were incubated in DMEM/F12 plus 2% dFBS 
with either 0, 50 or 500 ng ml! FGF4 and/or 0, 50 or 500 ngml~ !WNT3A (R&D 
Systems) for 6, 48 or 96h. Cultures were then grown in permissive medium 
consisting of DMEM plus 10% FBS for an additional 7 days. 

Directed differentiation into hindgut and intestinal organoids. After differ- 
entiation into definitive endoderm, cells were incubated in 2% dFBS-DMEM/F12 
with either 50 or 500ngml' FGF4 and/or 50 or 500ngml_ ' WNT3A (R&D 
Systems) for 2-4 days. After 2 days with treatment of growth factors, three- 
dimensional floating spheroids were present in the culture. Three-dimensional 
spheroids were transferred into an in vitro system to support intestinal growth and 
differentiation previously described'*'*. Briefly, spheroids were embedded in 
Matrigel (BD Bioscience; no. 356237) containing 500ngml ' R-Spondin1 
(R&D Systems), 100ngml~* Noggin (R&D Systems) and 50ngml + EGF 
(R&D Systems). After the Matrigel solidified, medium (advanced DMEM/F12; 
Invitrogen) supplemented with L-glutamine, 10}1M HEPES, N2 supplement 
(R&D Systems), B27 supplement (Invitrogen), and _penicillin/streptomycin- 
containing growth factors was overlaid and replaced every 4 days. 

Generation and characterization of iPSC lines. Normal human skin keratino- 
cytes (HSKs) were obtained from donors with informed consent (Cincinnati 
Children’s Hospital Medical Center (CCHMC) Institutional Review Board pro- 
tocol CR1_2008-0899). Normal HSKs were isolated from punch biopsies follow- 
ing trypsinization and subsequent culture on irradiated NIH3T3 feeder cells in F 
medium”. For iPSC generation, normal HSKs were transduced on two con- 
secutive days with a 1:1:1:1 mix of recombinant RD114-pseudotyped retroviruses 
expressing OCT4, SOX2, KLF4 and MYC**** in the presence of 8 pg ml poly- 
brene. Twenty-four hours after the second transduction the virus mix was replaced 
with fresh F medium and cells were incubated for an additional three days. Cells 
were then trypsinized and seeded into 6-well dishes containing 1.875 X 10° irra- 
diated mouse fibroblasts per well and Epilife medium. On the following day, 
medium was replaced with DMEM/F12 50:50 medium supplemented with 20% 
knockout serum replacement, 1 mM L-glutamine, 0.1 mM B-mercaptoethanol, 1X 
non-essential amino acids, 4ng ml ! basic fibroblast growth factor, and 0.5 mM 
valproic acid. Morphologically identifiable iPSC colonies arose after 2-3 weeks 
and were picked manually, expanded and analysed for expression of human PSC 
markers NANOG, DNMT3B, and using the antigen antibodies Tral-60 and 
Tral-81°’**. Early passage iPSC lines were adapted to feeder-free culture condi- 
tions consisting of maintenance in mTeSR1 (Stem Cell Technologies) in culture 
dishes coated with Matrigel (BD Biosciences) and lines were karyotyped. 
Microarray analysis of human ES cells, iPSCs and definitive endoderm cul- 
tures. For microarray analysis, RNA was isolated from undifferentiated and 3-day 
activin-treated human ES cell and iPSC cultures and used to create target DNA for 
hybridization to Affymetrix Human 1.0 Gene ST Arrays using standard proce- 
dures (Affymetrix). Independent biological triplicates were performed for each cell 
line and condition. Affymetrix microarray Cel files were subjected to RMA nor- 
malization in GeneSpring 10.1. Probe sets were first filtered for those that are 
overexpressed or underexpressed and then subjected to statistical analysis for 
differential expression by 2 fold or more between undifferentiated and differen- 
tiated cultures with P < 0.05 using the Students t-test. Log2 gene expression ratios 


were then subjected to hierarchical clustering using the standard correlation dis- 
tance metric as implemented in GeneSpring. 

Adenoviral-mediated expression of NEUROG3. Adenoviral plasmids were 
obtained from Addgene and particles were generated as previously described"'. 
Transduction was done on 28-day organoids that were removed from Matrigel, 
manually bisected then incubated in Ad-GFP or Ad-NEUROG3 viral supernatant 
and medium ata 1:1 ratio for approximately 4 h. Organoids were then re-embedded 
in Matrigel and incubated overnight with viral supernatant and medium at a 1:1 
ratio. The next day, fresh organoid medium was placed on the cultures and was 
changed as described until the end of the experiment. 

shRNA knockdown human ES cell lines. GipZ shRNA lentiviral vectors were 
obtained from Open Biosystems (GipZ-NEUROG3 Open Biosystems clone no. 
v2lhs_309089; v2lhs_309091; v2lhs_309093; v2lhs_309092 and GipZ-Control; 
Openbiosystems clone no. RHS4346). The CCHMC Viral Vector Core produced 
high-titre lentiviral particles for each plasmid. Low-passage H9 human ES cells 
were dissociated into a single-cell suspension using Accutase, were spun down and 
resuspended in mTesRI containing 10 uM Y-27632. Cells were plated at low density 
and incubated with lentivirus for 24h. For the NEUROG3 shRNA knockdown line, 
particles from all four vectors were used. mTesR1 was replaced daily, and after 72 h 
selection for puromycin- (2-4 1g ml” ') resistant human ES cells was carried out. 
Puromycin-resistant colonies were routinely maintained and passaged in 
mTesR1+puromycin (4g ml” '). 

B-Ala-Lys-AMCA uptake. $-Ala-Lys-AMCA was purchased from BioTrend 
Chemicals and was resuspended in water. Intestinal organoids were cut in half using 
a scalpel and were incubated for four hours in advanced DMEM/F12 plus 24 1M 
B-Ala-Lys-AMCA. Following incubation, organoids were washed several times in 
PBS, embedded in OCT freezing medium and were frozen at —70 °C. Ten-micrometre 
cryosections were cut and processed for standard immunohistochemistry. 

Tissue processing, immunohistochemistry and microscopy. Tissues were fixed 
for 1h to overnight in 4% paraformaldehyde or 3% glutaraldehyde for transmis- 
sion electron microscopy (TEM). Cultured PSCs and definitive endoderm cells 
were stained directly. Hindgut and intestinal organoids were embedded in paraffin, 
epoxy resin LX-112 (Ladd Research), or frozen in OCT. Sections were cut at 
6-10 tm for standard microscopy and 0.1 jum for TEM. TEM sections were stained 
with uranyl acetate. Paraffin sections were deparaffinized, subjected to antigen 
retrieval, blocked in the appropriate serum (5% serum in 1X PBS plus 0.5% 
Triton-X) for 30 min, and incubated with primary antibody overnight at 4°C. 
Slides were washed and incubated in secondary antibody in blocking buffer for 
2h at room temperature (23 °C). For a list of antibodies used and dilutions, see 
Supplementary Table 3. Slides were washed and mounted using Fluormount-G. 
Confocal images were captured on a Zeiss LSM510 and Z-stacks were analysed and 
assembled using AxioVision software. An Hitachi H7600 transmission electron 
microscope was used to capture images. 

RNA isolation, RT-qPCR. RNA was isolated using the Nucleospin II RNA isola- 
tion kit (Clonetech). Reverse transcription was carried out using the SuperScriptIII 
Supermix (Invitrogen) according to manufacturer’s protocol. Finally, qPCR was 
carried out using Quantitect SybrGreen MasterMix (Qiagen) on a Chromo4 Real- 
Time PCR (BioRad). PCR primers sequences were typically obtained from 
qPrimerDepot (http://primerdepot.nci.nih.gov/). Primer sequences are available 
upon request. 
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A unique chromatin signature uncovers early 
developmental enhancers in humans 


Alvaro Rada-Iglesias', Ruchi Bajpai’, Tomek Swigut', Samantha A. Brugmann’, Ryan A. Flynn! & Joanna Wysocka'* 


Cell-fate transitions involve the integration of genomic informa- 
tion encoded by regulatory elements, such as enhancers, with the 
cellular environment'”. However, identification of genomic 
sequences that control human embryonic development represents 
a formidable challenge’. Here we show that in human embryonic 
stem cells (hESCs), unique chromatin signatures identify two dis- 
tinct classes of genomic elements, both of which are marked by the 
presence of chromatin regulators p300 and BRG1, monomethyla- 
tion of histone H3 at lysine 4 (H3K4mel1), and low nucleosomal 
density. In addition, elements of the first class are distinguished by 
the acetylation of histone H3 at lysine 27 (H3K27ac), overlap with 
previously characterized hESC enhancers, and are located proxi- 
mally to genes expressed in hESCs and the epiblast. In contrast, 
elements of the second class, which we term ‘poised enhancers’, are 
distinguished by the absence of H3K27ac, enrichment of histone 
H3 lysine 27 trimethylation (H3K27me3), and are linked to genes 
inactive in hESCs and instead are involved in orchestrating early 
steps in embryogenesis, such as gastrulation, mesoderm formation 
and neurulation. Consistent with the poised identity, during dif- 
ferentiation of hESCs to neuroepithelium, a neuroectoderm- 
specific subset of poised enhancers acquires a chromatin signature 
associated with active enhancers. When assayed in zebrafish 
embryos, poised enhancers are able to direct cell-type and stage- 
specific expression characteristic of their proximal developmental 
gene, even in the absence of sequence conservation in the fish 
genome. Our data demonstrate that early developmental enhancers 
are epigenetically pre-marked in hESCs and indicate an unappre- 
ciated role of H3K27me3 at distal regulatory elements. Moreover, 
the wealth of new regulatory sequences identified here provides an 
invaluable resource for studies and isolation of transient, rare cell 
populations representing early stages of human embryogenesis. 

Recent reports demonstrated that active enhancers can be iden- 
tified by epigenomic profiling of p300 (ref. 4), H3K4mel and 
H3K27ac>’. To characterize the enhancer repertoire of hESCs we 
performed chromatin immunoprecipitation coupled to massively par- 
allel DNA sequencing (ChIP-seq) using antibodies recognizing chro- 
matin regulators (that is, p300, BRG1) and histone modifications (that 
is, H3K4mel1, H3K27ac, H3K4me3, H3K27me3) that distinguish dis- 
tal elements from proximal promoters’® (Supplementary Fig. 1). As 
expected, previously characterized hESC enhancers (for example, 
NANOG (ref. 7) and OCT4 (also called POUSF1)*) were bound by 
p300 and flanked by H3K4me1 and H3K27ac marked chromatin, but 
were not enriched for H3K27me3 or H3K4me3 (Fig. la and 
Supplementary Fig. 2a). Genome-wide analysis defined 5,118 geno- 
mic regions (hereafter referred to as class I elements) marked by a 
similar chromatin signature (that is, high p300, H3K4mel and 
H3K27ac, low, if any, H3K4me3, and absence of H3K27me3), repre- 
senting putative active hESC enhancers (Fig. 1b and Supplementary 
Data 1). 

Interestingly, in the vicinity of many early developmental genes we 
noted promoter-distal p300-bound regions that were marked by 


H3K4mel but, in contrast to the active hESC enhancers, lacked 
H3K27ac and were instead enriched for H3K27me3, a modification 
associated with polycomb silencing’ (Fig. 1a). Overall, we identified 
2,287 p300-bound regions devoid of H3K27ac and marked by 
H3K27me3, which we will hereafter refer to as class II elements 
(Fig. 1b and Supplementary Data 1). In general, class II elements 
showed enrichment of both H3K27me3 and H3K4mel flanking 
p300 peaks (Fig. 1b). In contrast, analysis of previously described adult 
tissue-specific enhancers'®’ revealed no enrichment for any of the 
interrogated modifications (Supplementary Fig. 2b-e). 

p300 enrichment levels were comparable at class I and II elements 
(Supplementary Fig. 3a), both classes were bound by BRGI (Sup- 
plementary Fig. 3b), and showed similar genomic distribution relative 
to annotated transcription start sites (TSS), with over 95% of regions 
located away from promoters (Fig. 1c). Moreover, only 1.7% and 3.9% 
of class I and class II elements, respectively, overlapped with CpG 
islands, in sharp contrast to the 50% overlap observed for promoters. 
Another property of enhancers is their relative nucleosomal depletion 
compared to the flanking regions'*”*. Using FAIRE-seq (formaldehyde- 
assisted isolation of regulatory elements’® coupled to sequencing) we 
showed that class I and II elements were comparably nucleosome- 
depleted (Supplementary Fig. 3c). Furthermore, examination of a 
reported DNA-methylation-sensitive restriction enzyme data set from 
hESCs"’ revealed similar levels of DNA hypomethylation at class I and 
class IT elements (Supplementary Fig. 3d). 

ChIP-seq results were validated by ChIP-qPCR at a representative 
subset of class I and class II elements (labelled with the name of their 
closest gene) (Supplementary Figs 4a—d and 5). Further examination of 
the H3K27ac and H3K27me3 enrichments showed a mutually exclusive 
marking pattern at class I and class II elements (Supplementary Fig. 6). 
Sequential ChIP-qPCR demonstrated a simultaneous presence of 
H3K4mel1/K27ac at class I regions, and H3K4me1/K27me3 at class II 
regions, indicating that the concurrent enrichments of H3K4mel and 
H3K27me3 were not due to cell population heterogeneity (Fig. 2a, b). 
Moreover, consistent with H3K27me3, we observed enrichment of the 
PRC2 component, SUZ12, at class II elements (Supplementary Fig. 4e). 
We also detected preferential association of RNA POL2 with class I 
elements, as compared to class II elements, including its unphosphory- 
lated, Ser5 phosphorylated and Ser 2 phosphorylated forms (Sup- 
plementary Fig. 7a-c). 

Next we asked whether transcriptional status of nearby genes differs 
between the two classes. To this end, we analysed hESC transcriptome 
by RNA-seq and examined transcripts originating from TSS closest to 
the elements of each class. Class-I-associated gene expression was sig- 
nificantly higher than expression of all genes, or of class-II-associated 
genes, which were poorly expressed (Fig. 2c). In agreement, class-II- 
associated TSS were enriched for both H3K27me3 and H3K4me3, 
whereas class-I-associated TSS were marked by high H3K4me3 levels 
(Supplementary Fig. 8a, c). Thus, the two classes defined by unique 
chromatin signatures are also distinguished by the transcriptional 
status of associated genes. 
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Figure 1 | Unique chromatin signatures distinguish two classes of 


regulatory elements in hESCs. a, Genome browser representations of p300, 
H3K4mel, H3K27ac, H3K27me3 and H3K4me3 enrichment profiles in hESCs 
are shown for a representative class I (for example, NANOG, top) and class II 
(for example, NODAL, bottom) element and its flanking regions. The peak 
height corresponds to normalized fold enrichments as calculated by QuEST. 
b, Average hESC ChIP-seq signal profiles were generated for the indicated 
histone modifications around the central position of p300-bound regions, over 
class I (top) and class II (bottom) elements, respectively. c, Class I and II 
elements were mapped to their closest Ensembl gene TSS and the distribution 
of distances between elements and TSS is shown. 


To investigate whether the two classes are linked to genes of distinct 
functional annotations, we performed ontology analysis with the 
Genomic Regions Enrichment of Annotations Tool (GREAT)’® 
(Fig. 2d, e and Supplementary Data 2 and 3). Class I elements showed 
association with genes expressed in the epiblast, whose mouse homo- 
logues exhibit knockout phenotypes with defects in pre- and peri- 
implantation development (Fig. 2d). In contrast, class II elements 
are linked to genes expressed at, and essential for, gastrulation, germ- 
layer formation, neurulation and early somitogenesis (including NODAL, 
EOMES, LEFTY2, EN1, as well as FOX, SOX and WNT family mem- 
bers) (Fig. 2e). Notably, we did not observe enrichment of adult-tissue 
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Figure 2 | Functional and molecular characterization of class I and II 
elements. a, b, Sequential ChIP experiments were performed from hESCs with 
the indicated pairs of histone modification antibodies. ChIP material was 
analysed by qPCR for select class I and class II elements, as well as negative 
control regions (NEG1-3). The y axis shows per cent input recovery; error bars 
represent standard deviation (s.d.) from three technical replicates. c, RNA-seq 
data set was obtained from hESC poly(A)-RNA and reads per kilobase per 
million mapped reads (RPKM) were calculated for all human Ensembl genes. 
RPKMs for all annotated genes (green) or for those closest to class I (red) or 
class II (blue) elements are represented as box plots. P-values were calculated 
using non-paired Wilcoxon tests. In the box plots, bottom and top of the boxes 
correspond to the 25th and 75th percentiles and the internal band is the 50th 
percentile (median). The plot whiskers extending outside the boxes correspond 
to the lowest and highest datum within 1.5 interquartile range of the lower and 
upper quartiles, respectively. d, e, Functional annotation of class I (d) and class 
II (e) elements was performed using GREAT. The top over-represented 
categories belonging to three different ontologies are shown: Mouse Genome 
Informatics (MGI) expression detected (red) contains information on tissue- 
and developmental-stage-specific expression in mouse; Gene Ontology (GO) 
biological process (green) describes the biological processes associated with 
gene function; mouse phenotypes (blue) ontology contains data about mouse 
genotype—phenotype associations. The x axes values (in logarithmic scale) 
correspond to the binomial raw (uncorrected) P-values. 
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categories among class-II-linked genes, indicating no association with 
late enhancers. 

Taken together, our results suggest that class II elements represent 
poised enhancers, which reveal their cell-type-dependent activity during 
development. One prediction from this hypothesis is that upon differ- 
entiation to a specific fate, a subset of poised enhancers linked to genes 
induced in this fate should acquire an active, class I signature. To test this 
prediction, we differentiated hESCs into neuroectodermal spheres 
(hNECs)"’, generated p300, H3K4mel, H3K27ac and H3K27me3 pro- 
files by ChIP-seq, and identified genomic elements that were marked by 
class II signature in hESCs, but acquired a strong enrichment of 
H3K27ac in hNECs (195 unique regions, Supplementary Data 1). 
Histone modification profiling over these regions showed concomitant 
decrease in H3K27me3 (Fig. 3a, b and Supplementary Fig. 9a) and we 
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Figure 3 | A subset of class II elements acquires active enhancer chromatin 
signature upon neuroectodermal differentiation. a, Average hNEC ChIP-seq 
signal profiles were generated for the indicated histone modifications around 
the central position of those p300-bound regions (as determined in hESC) that 
acquired H3K27ac enrichment in hNECs (that is, class II elements). 

b, Genome browser representation of p300, H3K4mel, H3K27ac and 
H3K27me3 (in hESCs and hNECs) binding profiles at a representative class 
III element. The peak height corresponds to normalized fold enrichments as 
calculated by QuEST. c-e, ChIP-qPCR analyses from hNECs with indicated 
histone modification antibodies at select elements including: class I elements 
that were only active in hESCs (active ESC), or in both hESCs and hNECs 
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refer to them hereafter as class III elements. Of note, a large number of 
the remaining class II regions (that is, those that did not acquire 
H3K27ac) retained H3K4mel and H3K27me3 signature in hNECs, 
but showed diminished p300 occupancy (Supplementary Fig. 9b-d). 

The aforementioned observations were validated by ChIP-qPCR for 
a representative subset of enhancers (Fig. 3c-e). We further showed 
that class III elements acquired RNA POL2 enrichment in hNECs, 
whereas hESC-specific active enhancers showed diminished RNA 
POL2 binding (Supplementary Fig. 10a). In agreement with a report 
documenting short bidirectional transcripts originating from enhancers”, 
we detected an increased level of bidirectional transcription from class 
III elements upon differentiation to hNECs, whereas transcripts 
originating from NANOG and OCT4 enhancers were downregulated 
(Supplementary Fig. 10b, c). 
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(active both), or class I elements that did not acquire H3K27ac in hNEC (class 
II), or class III elements. The y axis shows per cent input recovery; error bars 
represent s.d. from three technical replicates. ChIPs used in these qPCRs 
represent biological replicates of those samples used in ChIP-seq. f, RNA-seq 
data sets from hESC and hNEC poly(A)-RNA were used to calculate the RPKM 
for all human Ensembl genes. RPKMs in both cell types are represented as box 
plots for all genes (All), genes linked to class I elements, genes linked to class II 
elements, and genes linked to class III elements. P-values were calculated 
using paired (NEC class III versus ESC class III) or non-paired (NEC class 
III versus NEC class II) Wilcoxon tests. 
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GREAT annotation of class II elements showed association with 
genes expressed in neuroectoderm and related to abnormalities in 
nervous system development (Supplementary Fig. 11 and Supplemen- 
tary Data 4). In agreement, hNEC RNA-seq transcriptome analysis 
revealed significant upregulation of class-II—I-associated genes upon 
differentiation, whereas expression of the remaining class-II-associated 
genes was persistently low (Fig. 3f). Moreover, H3K27me3 levels at 
class-II—I-associated TSS were diminished and H3K4me3 levels 
induced in hNECs as compared to hESCs, whereas modification pro- 
files over TSS associated with the remaining class II elements were 
relatively unchanged (Supplementary Fig. 8b, d). 

To examine if upon differentiation class I—1 elements acquire the 
ability to drive gene expression, we infected hESCs with lentiviruses 
encoding a green fluorescent protein (GFP) reporter under the control 
of select class III (for example, SOX2, HES1), class I (for example, 
CD9, JARID2) and class I elements (for example, EOMES, MYF5) and 
monitored GFP fluorescence at day 1, 5 and 7 of differentiation to 
hNECs (Supplementary Table 1 and Supplementary Fig. 12). Class 
III reporters showed low, if any, fluorescence levels in hESCs, but 
were induced at day 5 of differentiation, whereas class I reporters 
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displayed a reverse pattern. Our results are consistent with class II 
elements representing poised developmental enhancers, which upon 
differentiation acquire, in a cell-type-dependent manner, the properties 
of active enhancers. 

To test whether class II elements indeed function as developmental 
enhancers, we examined their activity during embryogenesis. Sequence 
conservation analysis revealed that class II elements are evolutionarily 
constrained and display a higher degree of conservation than class I 
elements (Supplementary Fig. 13a). VISTA enhancer browser search7" 
identified fourteen class II elements for which enhancer activity was 
previously assayed at embryonic day 11.5 of mouse development. In 
nine cases, highly specific expression patterns were noted (Supplemen- 
tary Table 2). Interestingly, two enhancers (WNT8B, CDH2) belong to 
the class III and, in agreement, drive gene expression specifically 
in neuroectoderm-derived structures in the mouse (Supplementary 
Table 2). 

Next we screened enhancer activity of a select set of class II elements 
using zebrafish embryo transgenic reporter assay~*”’. Selected elements 
correspond to previously uncharacterized human genomic sequences 
(except for WNTS8B) that are located in proximity to genes whose 
zebrafish homologues have known expression patterns, although the 
elements themselves are generally not well conserved in the zebrafish 
genome (Supplementary Figs 13 and 14). GFP reporters were injected 
into one-cell-stage embryos and fluorescence was monitored through- 
out fish embryogenesis (Supplementary Fig. 15). For eight out of nine 
assayed class II reporters, specific and reproducible GFP patterns were 
observed at distinct developmental stages and anatomical locations 
(Fig. 4a-f, Supplementary Fig. 14 and Supplementary Table 3). 

A first subgroup of assayed elements (for example, NODAL, EOMES, 
LEFTY2) drove gastrulation-specific expression at the shield, the fish 
equivalent of mouse primitive groove (Fig. 4a and Supplementary Fig. 
14). Although none of the three tested sequences is well conserved in 
fish, proximal genes NODAL, EOMES and LEFTY2 are conserved 
across vertebrates, with shield-specific expression pattern of zebrafish 
NODAL and LEFTY2 homologues” (Supplementary Figs 14 and 16a). 
From mice to frogs, EOMES expression is initially restricted to the 
primitive groove and blastopore lip, respectively*”®, but the zebrafish 
EOMES homologue is only expressed at later stages (ZFIN database, 
identifier ZDB-PUB-051025-1). Remarkably, the element represent- 
ing a putative EOMES enhancer drives shield-specific expression, 


Figure 4 | Class II elements have developmental enhancer activity in vivo. 
a, Merged bright-field and GFP images are shown for representative shield 
stage zebrafish embryos injected with class II elements proximal to human 
EOMES, LEFTY2 and NODAL. For the EOMES enhancer, dorsal (anterior to 
top) and lateral (shield to right) views are presented in the left and right panels, 
respectively. For LEFTY2 and NODAL, animal pole (shield to top) and lateral 
(shield to right) views are presented in the left and right panels, respectively. 
White arrows indicate the location of the shield in each image. A, anterior; D, 
dorsal. Scale bar, 150 j1m. b-f, Merged bright-field and GFP images are shown 
for representative 24—28 h.p.f. zebrafish embryos injected with class II elements 
proximal to SOX2 (b), EN1 (c), NKX2-1 (d), WNTS8B (e) and MIXL1 (f) genes. 
In b-e, schematics highlighting the relevant anatomical structures where GFP 
expression was reproducibly observed are shown on the left, and three images 
correspond, from left to right and top to bottom, to whole-embryo flattened 
dorsal views, dorsal anterior views and lateral anterior views, respectively. In f, a 
lateral posterior view is shown. In b-f, scale bar = 150 ttm. MHB, midbrain- 
hindbrain boundary. g, Proposed model for enhancer bookmarking during 
early embryonic development. Poised developmental enhancers (class II) are 
marked by a unique chromatin signature, involving occupancy of chromatin 
modifiers p300, BRG1 and PRC2 and nucleosomal regions marked by 
H3K4mel and H3K27me3. During differentiation, appropriate developmental 
and signalling cues are able to rapidly transition these poised, pre-marked 
enhancers into an active state represented by the acquisition of H3K27ac, RNA 
POL2 binding, recruitment of tissue-specific transcription factors (TFs) and 
loss of H3K27me3, leading to the establishment of tissue-specific gene 
expression patterns. 


indicating responsiveness of this human sequence to zebrafish gastru- 
lation circuitry. 

A second subgroup of class II reporters (for example, SOX2, NKX2- 
1, EN1, WNTS8B, MIXL1) drove GFP expression at later developmental 
stages (24-28 h post fertilization (h.p.f.)) (Fig. 4b-f); this expression 
was restricted to specific anatomical structures such as the midbrain- 
hindbrain boundary (EN1)” or the ventral diencephalon/hypothalamus 
(NKX2-1)**. Again, despite the low degree of sequence conservation in 
fish (Supplementary Fig. 13), observed GFP patterns were generally 
consistent with the reported expression of the putative target gene 
homologues**” (Supplementary Fig. 16b-d). 

Importantly, specificity of our results was validated with an extensive 
set of control regions, including: (1) five class I elements; (2) four non- 
conserved genomic regions flanking select analysed class II elements; 
(3) four human adult tissue-specific enhancers; (4) three randomly 
selected intergenic non-conserved regions; (5) empty vector (Sup- 
plementary Table 4). All control regions showed only weak, diffused 
and nonspecific GFP patterns from 6h.p.f. to 5 d.p.f. (Supplementary 
Figs 17-21). It is worth mentioning that based on our limited analysis, 
class I elements active in hESCs do not appear to drive pre-specification 
expression in zebrafish. Finally, to address whether expression patterns 
driven by class II elements are dynamic, we monitored several reporters 
(LEFTY2, SOX2, EN1, NKX2-1) throughout embryogenesis for up to 
5 d.p.f. In all cases, GFP patterns were transient in nature, with fluor- 
escence signals barely detectable after 3 d.p.f. (Supplementary Figs 17-21), 
further underscoring that class II regions represent dynamically regu- 
lated developmental enhancers. 

We uncovered a unique chromatin signature that bookmarks early 
developmental enhancers in pluripotent cells, likely to prime them for 
a response to signalling and developmental cues (Fig. 4g). In addition 
to novel insights into gene regulation, our study identified a set of over 
2,000 putative regulatory sequences, thereby creating an invaluable 
resource for lineage tracking and isolation of transient cell populations 
representing early steps of human development. 


METHODS SUMMARY 

ChIP-seq. Approximately 10’ hESCs or hNECs were used for each ChIP experi- 
ment. Cells were crosslinked with 1% formaldehyde for 10 min at 25 °C, chromatin 
was sonicated and immunoprecipitated with 3-5 1g of antibody. Sequencing libraries 
were prepared according to Illumina protocol from: hESC and hNEC p300 ChIP, 
hESC BRG1 ChIP, hESC FAIRE, hESC and hNEC H3K4me3 ChIP, hESC and hNEC 
H3K4mel ChIPs, hESC and hNEC H3K27me3 ChIPs, hESC and hNEC H3K27ac 
ChIPs, hESC and hNEC input DNAs. Libraries were sequenced using Illumina 
Genome Analyser and resulting sequence reads mapped by ELAND (Illumina 
Inc.) and analysed by QuEST 2.4 (ref. 30). 


Full Methods and any associated references are available in the online version of 
the paper at www.nature.com/nature. 
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METHODS 

hESC culture. hESCs (H9 line, Wi-Cell) were expanded in feeder-free, serum-free 
medium, mTESR-1 from StemCell technologies. Cells were passaged 1:7 every 
5-6 days by incubation with accutase (Invitrogen) and resultant small cell clusters 
(50-200 cells) were subsequently re-plated on tissue culture dishes coated over- 
night with growth-factor-reduced matrigel (BD Biosciences). hESC quality was 
regularly tested by evaluating the expression of a panel of hESC markers (for 
example, alkaline phosphatase, OCT4) and the capacity to differentiate into cell 
types derived from the three germ layers. 

Neuroectoderm cell (NEC) differentiation. hESCs were differentiated into 
hNECs using a previously described differentiation protocol”’. Briefly, hESCs were 
incubated with 2mgml ’ collagenase. Once detached, cells were plated in NEC 
differentiation media: 1:1 neurobasal medium/DMEM F-12 medium (Invitrogen), 
0.5X B-27 supplement minus vitamin A (50% stock, Invitrogen), 0.5 N-2 sup- 
plement (100% stock, Invitrogen), 20 ng ml | bFGF (Peprotech), 20 ng ml | EGF 
(Sigma-Aldrich), 5 pg ml ! bovine insulin (Sigma-Aldrich), 0.1 pg ml”! recom- 
binant human NOGGIN (Peprotech), 1X Glutamax-I supplement (100% stock, 
Invitrogen). Cells were differentiated for 7 days, changing media every other day. 
Chromatin immunoprecipitation (ChIP), sequential ChIP, FAIREand antibodies. 
ChIP assays were performed from approximately 10’ hESCs or hNECs per experi- 
ment, according to previously described protocol with slight modifications*’. 
Briefly, cells were crosslinked with 1% formaldehyde for 10 min at room temper- 
ature and formaldehyde was quenched by addition of glycine to a final concen- 
tration of 0.125 M. Chromatin was sonicated to an average size of 0.5-2 kb, using 
Bioruptor (Diagenode). A total of 3-5 1g of antibody was added to the sonicated 
chromatin and incubated overnight at 4 °C. 10% of chromatin used for each ChIP 
reaction was kept as input DNA. Subsequently, 75 ul of protein A or protein G 
Dynal magnetic beads (depending of antibody species and Ig isotype) were added 
to the ChIP reactions and incubated for four additional hours at 4°C. Magnetic 
beads were washed and chromatin eluted, followed by reversal of the crosslinkings 
and DNA purification. Resultant ChIP DNA was dissolved in water. 

Sequential ChIPs were performed as previously described with slight modifica- 
tions’. Chromatin was prepared as described above for ChIP and after addition of 
the first antibody (3-5 tg) and corresponding washes, magnetic beads were resus- 
pended in 75 pl TE/10mM DTT. Samples were diluted 20 times with dilution 
buffer (1% Triton X-100, 2 mM EDTA, 20 mM Tris-HCl pH 8, 150 mM NaCl) and 
second antibody (3-5 jig) was added to each reaction. Beads were then washed, 
crosslinking reversed and DNA purified and dissolved in water. 

For FAIRE, sonicated chromatin was prepared as for ChIP and DNA was 
extracted as previously described'® 

All antibodies used in this study have been previously reported as suitable for 
ChIP: p300 (sc-585, Santa Cruz Biotechnology)’, BRG1 (clone JA1, a gift from G. 
Crabtree)’, H3K4mel (ab8895, Abcam)°, H3K27ac (ab4729, Abcam)°, H3K4me3 
(39159, Active Motif)**, H3K27me3 (39536, Active Motif)**, RNA POL2 unpho- 
sphorylated (8WG16 clone, MMS-126R, Covance)**, RNA POL2 ser5P (ab5131, 
Abcam)’, RNA POL2 ser2P (ab5095, Abcam)’, normal rabbit IgG (12-370, 
Millipore). 

ChIP-qPCR. All primers used in qPCR analysis are shown in Supplementary Data 
5. Primers are named after proximal putative target genes of the investigated 
enhancers. For each tested genomic element, two sets of primers were used, one 
set overlapping the peak of maximal p300 enrichment (central primers) and 
another set overlapping flanking regions with histone modification enrichments 
(flanking primers). This strategy was used because p300 peaks typically occurred 
within nucleosome-poor regions. qPCR analysis was performed in a Light Cycler 
480II machine (Roche), using technical triplicates and ChIP-qPCR signals were 
calculated as percentage of input. Standard deviations were measured from the 
technical triplicate reactions and represented as error bars. 

RT-qPCR of enhancer RNAs. To assess levels of enhancer-associated transcrip- 
tion, total RNA from hESCs and hNECs differentiated for 7 days was isolated 
using Trizol reagent followed by ethanol precipitation according to the manufac- 
ture’s protocol (Invitrogen). To remove genomic DNA contaminants, the Turbo 
DNA-Free kit was used following rigorous DNase treatment (two times, 30 min 
incubations at 37 °C). cDNA was generated from 100 ng of DNA-free RNA using 
the QuantiTech Reverse Transcription Kit (Qiagen) with two modifications: (1) 
The gDNA elimination reaction was extended for 5 min and (2) the reverse tran- 
scription elongation time was 30 min. Quantitative PCR (qPCR) primers were 
designed (Supplementary Data 5) to target regions surrounding the p300 peaks 
that defined each tested enhancer. qPCR runs and analysis were preformed on the 
Light Cycler 4801] machine (Roche). To calculate fold change between the hESCs 
and hNECs, the AAC, method was used and the 18S rRNA transcripts were used as 
a loading control. Standard deviations were measured from technical triplicate 
reactions and were represented as error bars. Biological replicate experiments for 
hNECs were performed and very similar results were obtained (data not shown). 


ChIP-seq. Libraries were prepared from: hESC and hNEC p300 ChIP, hESC 
BRGI ChIP, hESC FAIRE, hESC and hNEC H3K4me3 ChIP, hESC and hNEC 
H3K4mel ChIPs, hESC and hNEC H3K27me3 ChIPs, hESC and hNEC H3K27ac 
ChIPs, hESC and hNEC input DNAs. ChIP-seq, FAIRE-seq and input libraries 
were prepared according to Illumina protocol and sequenced using Illumina 
Genome Analyser. All sequences were mapped by ELAND software (Illumina 
Inc.) and analysed by QuEST 2.4 software****. ChIP-seq enrichment regions for 
the following profiled proteins were determined using the indicated settings, 
according to QuEST recommendations: hESC p300: KDE (kernel density estima- 
tion) bandwidth = 30, ChIP seeding fold enrichment = 30, ChIP extension fold 
enrichment = 3, ChIP-to-background fold enrichment = 3; hESC H3K4me3: 
KDE bandwidth = 60, ChIP seeding fold enrichment = 30, ChIP extension fold 
enrichment = 3, ChIP-to-background fold enrichment = 3; hESC H3K4mel: 
KDE bandwidth = 100, ChIP seeding fold enrichment = 10, ChIP extension fold 
enrichment = 3, ChIP-to-background fold enrichment = 2.5; hESC H3K27me3: 
KDE bandwidth = 100, ChIP seeding fold enrichment = 10, ChIP extension fold 
enrichment = 8, ChIP-to-background fold enrichment = 2.5; hESC and hNEC 
H3K27ac: KDE bandwidth = 100, ChIP seeding fold enrichment = 10, ChIP 
extension fold enrichment = 3, ChIP-to-background fold enrichment = 2.5. 

For all ChIP-seq data sets, WIG files were generated with QuEST, which were 

subsequently used for visualization purposes and for obtaining average signal 
profiles. 
RNA-seq. RNAs from hESCs and NECs were extracted with Trizol (Invitrogen), 
following the manufacturer’s recommendations. 10 1g of total RNA were subjected 
to two rounds of oligo-dT purification using Dynal oligo-dT beads (Invitrogen). 
100 ng of the purified RNA were fragmented with 10 fragmentation buffer 
(Ambion). Fragmented RNA was used for first-strand cDNA synthesis, using 
random hexamer primers (Invitrogen) and SuperScript II enzyme (Invitrogen). 
Second strand cDNA was obtained by adding RNaseH (Invitrogen) and DNA 
Pol I (New England Biolabs) to the first strand cDNA mix. The resulting double- 
stranded cDNA was used for Illumina library preparation as described for ChIP-seq 
experiments. 

RNA-seq libraries were sequenced with Illumina Genome Analyser and both 

mapping and analysis of resulting reads were performed with DNAnexus software 
tools (https://dnanexus.com). Reads per kilobase per million mapped reads 
(RPKM) were calculated for all human Ensembl genes. The specificity and quality 
of our RNA-seq data can be visualized at several hESC- or hNEC-specific genes 
(Supplementary Fig. 22). 
Class I and class II element selection criteria. ChIP-seq enrichment regions as 
determined by QuEST were used to define class I and class II elements (Sup- 
plementary Data 1). To this end, operations (intersection, subtraction, and so 
on) between genomic data sets were performed with GALAXY (http://main. 
g2.bx.psu.edu/) and the following selection criteria were used: class I elements 
(5,518 regions): genomic regions with hESC p300 enrichment (ChIP seeding fold 
enrichment >30), located within 2 kb of regions enriched in hESC H3K4mel and 
H3K27ac (ChIP seeding fold enrichment >10 for both modifications), and, to 
distinguish these elements from proximal promoters, we demanded that these 
regions do not overlap with hESC H3K4me3 (ChIP seeding fold enrichment 
>30); class II elements (2,287 regions): genomic regions with hESC p300 enrich- 
ment (ChIP seeding fold enrichment >30), located within 2 kb of regions enriched 
in hESC H3K27me3 (ChIP seeding fold enrichment >8). These regions were 
further required not to overlap with hESC H3K4me3 (ChIP seeding fold enrich- 
ment >30) or hESC H3K27ac (ChIP seeding fold enrichment >10). Class III 
elements (195 regions): class II elements (as determined in hESCs) which in 
hNECs acquired enrichment in H3K27ac (H3K27ac ChIP seeding fold enrich- 
ment >10, within 2 kb of p300 peaks defining class II elements). 

In total, we identified 11,543 regions marked by p300 and H3K4mel in hESCs, 
of which 1,639 did not contain H3K27ac, H3K27me3 or H3K4me3 enrichment. A 
total of 3,531 regions were enriched for p300, H3K4mel1 and H3K4me3 (those 
generally corresponded to proximal promoters). 

Please note that although our definition of class II elements does not use an 
H3K4mel enrichment filter, about 55% of class II regions are enriched for 
H3K4mel at ChIP seeding fold enrichment >10; when lower cutoff is allowed, 
the overlap is significantly more substantial. Thus, the vast majority, if not all, class 
Il elements probably contains above-background levels of H3K4mel, as exemplified 
by the observation that class II elements with ChIP-seq H3K4mel levels below the 
seeding fold enrichment >10 cutoff are still substantially enriched for H3K4mel 
when assayed by ChIP-qPCR (see Supplementary Fig. 5, for example, CHD2, 
EPHA4, GPR19, ADRA2A, KLF5, EML1 regions). 

Other sequencing data analyses. Average ChIP-seq signal profiles around the 
centre of p300-enriched regions were generated with the Sitepro tool, part of the 
Cistrome Analysis pipeline (http://cistrome.dfci.harvard.edu/ap/), using the cor- 
responding WIG files generated with QuEST. Similarly, ChIP-seq signal profiles 
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were generated around gene TSS. For genes associated with the different classes of 
distal elements, each element was linked to its closest gene, based on the distance to 
TSS, and considering a maximum distance of 100 kb. 

Average PhastCons scores profiles around the centre of p300-enriched regions 
were generated with the Conservation/Aggregate Datapoints tool, part of the 
Cistrome Analysis pipeline (http://cistrome.dfci.harvard.edu/ap/). 

Distance between enhancers and their closest Ensembl gene TSS was calculated 
using PinkThing software (http://pinkthing.cmbi.ru.nl/) and Ensembl 52 assembly. 
With this information, it was possible to calculate the overall genomic distribution, 
based on distance to TSS, for the different enhancer groups and to assign enhancers 
to their closest genes. 

Functional annotation of enhancers was obtained with GREAT (http://great. 
stanford.edu/public/html/input.php), using the Basal plus extension association 
rules and the whole human genome as background. 

For RNA-seq data analysis, each enhancer was assigned to its closest gene based 
on distance to TSS considering a maximum distance of 100 kb, resulting in various 
gene groups each corresponding to an enhancer class (for example, class I, class II, 
class II). Statistical significance (P-values) of the difference in expression levels 
between different gene groups was calculated using two-sample one-sided 
Wilcoxon-test (R software, http://www.r-project.org). Paired or non-paired tests 
were performed when the same or different genes were compared, respectively. 
Box plots representing RPKM distribution were generated with R (http://www. 
r-project.org). 

MRE-seq (methylation-sensitive restriction enzyme) data for hESCs was 
obtained from the GEO data set public repository under accession number 
GSM450236. 

In vitro enhancer reporter assays in hESCs and hNECs. Representative class I, 
class III and class II elements (Supplementary Table 1) were cloned into a 
lentiviral vector (Sin-minTK-eGFP) in front of a minimal TK promoter driving 
GFP expression. hESC colonies were transduced with the appropriate lentiviruses 
and GFP fluorescence levels were subsequently monitored in undifferentiated 
hESCs, as well as in the course of hNEC differentiation (at day 1, 5 and 7 after 
induction of differentiation). 

Zebrafish reporter assays. The biological relevance of the identified human 
enhancers was evaluated using Tol2 transposon-mediated transgenesis in zebra- 
fish**. Selected human enhancers were PCR amplified and cloned in the pT2HE 
vector (gift from D. M. Kingsley), upstream of the hsp70 promoter and eGFP. Tol2 
transposase was in vitro transcribed using mMessage mMachine Sp6 kit 
(Ambion), according to the manufacturer’s instructions. It is worth mentioning 
that the hsp70 promoter independently drives robust and stable expression in the 
lens after 28-38 h.p.f.». This lens signal is also observed when additional sequences 
are placed upstream of the minimal hsp70 promoter, acting as a positive control for 
correct transgenesis. Vector DNA, with corresponding enhancers, and transposase 
RNA were mixed and injected in one-cell-stage zebrafish embryos as previously 
described. eGFP expression patterns were typically monitored at three different 
developmental times: 6-8 h.p.f., 10-14 h.p.f. and 24-28 h.p.f. According to ref. 24, 
using the described reporter assay method, 10-20% of the injected embryos are 
expected to display consistent and representative expression patterns. Because 50 
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embryos were typically injected, expression patterns were considered as repres- 
entative for a given enhancer if displayed by at least 5-10 embryos within each 
batch (the remaining embryos typically showed a nonspecific or lack of fluor- 
escence pattern). For those enhancers with identifiable and consistent expression 
patterns, a second set of injections (biological replicate) were performed for 50 
additional embryos and in all cases similar results were obtained compared to the 
first injections. 

Initial monitoring and embryo imaging were performed with a Leica M205 FA 
fluorescent stereoscope. High-resolution images presented in Fig. 4 were obtained 
with a Leica DM4500 B upright compound microscope. 

Although live embryos were typically monitored and imaged, in order to obtain 
flat whole-embryo images, selected embryos were fixed and the yolk removed. 
Briefly, 24-28 h.p.f. embryos were dechorionated and transferred to 4% para- 
formaldehyde solution in PBS. After overnight rocking at 4 °C, fixed embryos 
were washed and stored in methanol at 20 °C until ready to use. 

Specificity of our reporter assays was validated by assaying an extensive set of 
negative controls (Supplementary Table 4): (1) five class I elements; (2) four non- 
conserved genomic regions in proximity of four of the tested class II elements; (3) 
four human adult-tissue-specific enhancers that should not drive expression 
during early developmental stages; (4) three randomly selected intergenic non- 
conserved regions; (5) empty vector. 

In addition, four selected class II elements were followed up to 5 days post- 
fertilization, together with their corresponding flanking non-conserved regions 
and additional negative controls. GFP patterns were monitored after 6h.pf., 
24h.pf., 3 d.p.f. and 5 d.p.f. In these cases and for the class II elements, embryos 
showing specific patterns at the corresponding stage (for example, 6h.p.f. for 
LEFTY2 and 24h.p.f. for SOX2, EN1 and NKX2-1) were selected and their GFP 
patterns subsequently monitored. For the negative controls, once lens signal 
appeared (that is, transgenic embryos), such embryos were subsequently followed. 
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